Sign up
Log in
Sign up
Log in
Home
Blog

Announcing the collaboration between OctoAI and Unstructured

Blog Author - Pedro Torruella
Blog Author - Ronny Hoesada

Mar 20, 2024

1 minutes
OctoAI logo and Unstructured.io logo

Share

We’re excited to share the news of our recent collaboration between OctoAI and Unstructured. This collaboration is designed to make life easier for those working on unstructured data ingestion and processing for Large Language Models (LLM) applications. This new integration responds to business and engineering needs, combining Unstructured's enterprise-grade connectors with OctoAI’s embeddings API.

For Unstructured users, this means direct access to OctoAI’s embeddings model endpoint within the familiar interfaces you already use. This integration was put together with a focus on ease of use and minimizing the effort required to leverage a cost-efficient open-source embedding provider. On the other hand, for users of OctoAI, this integration opens up new avenues to easily access state-of-the-art document preprocessing capabilities.

Get your OctoAI API and Unstructured API keys, and use this code snippet to try out the integration:

import os

from unstructured.documents.elements import Text
from unstructured.embed.octoai import OctoAiEmbeddingConfig, OctoAIEmbeddingEncoder

embedding_encoder = OctoAIEmbeddingEncoder(
    config=OctoAiEmbeddingConfig(api_key=os.environ["OCTOAI_API_KEY"])
)
elements = embedding_encoder.embed_documents(
    elements=[Text("This is sentence 1"), Text("This is sentence 2")],
)

query = "This is the query"
query_embedding = embedding_encoder.embed_query(query=query)

[print(e.embeddings, e) for e in elements]
print(query_embedding, query)
print(embedding_encoder.is_unit_vector(), embedding_encoder.num_of_dimensions())

To learn more, please review the Unstructured and OctoAI documentation.