Skip to main content
Embedders convert text into high-dimensional vectors. Similar texts produce similar vectors, which enables semantic search in a vector database.

OpenAIEmbedder

Uses OpenAI’s embedding models. The most common choice.
from definable.knowledge import OpenAIEmbedder

embedder = OpenAIEmbedder(
    id="text-embedding-3-small",
    dimensions=1536,
)
id
str
default:"text-embedding-3-small"
OpenAI embedding model. Options include text-embedding-3-small, text-embedding-3-large, and text-embedding-ada-002.
dimensions
int
Output vector dimensions. Defaults to the model’s native dimensions. text-embedding-3-small supports up to 1536.
api_key
str
OpenAI API key. Defaults to the OPENAI_API_KEY environment variable.

Model Comparison

ModelDimensionsPerformanceCost
text-embedding-3-large3072HighestHigher
text-embedding-3-small1536GoodLow
text-embedding-ada-0021536GoodLow

VoyageAIEmbedder

Uses Voyage AI’s embedding models, which excel at domain-specific and multilingual content.
from definable.knowledge import VoyageAIEmbedder

embedder = VoyageAIEmbedder(
    id="voyage-2",
    dimensions=1024,
)
id
str
default:"voyage-2"
Voyage AI model. Options include voyage-2, voyage-large-2, and others.
dimensions
int
default:"1024"
Output vector dimensions.
api_key
str
Voyage AI API key. Defaults to the VOYAGE_API_KEY environment variable.
Requires the voyageai package. Install with pip install voyageai.

Using Embedders

With Knowledge

Pass an embedder when creating a knowledge base:
from definable.knowledge import Knowledge, InMemoryVectorDB, OpenAIEmbedder

knowledge = Knowledge(
    vector_db=InMemoryVectorDB(),
    embedder=OpenAIEmbedder(),
)

Standalone

Generate embeddings directly:
embedding = embedder.get_embedding("Hello, world!")
print(len(embedding))  # 1536

Batch Embedding

Embed multiple texts efficiently in a single API call:
texts = ["First document", "Second document", "Third document"]
embeddings, usages = await embedder.async_get_embeddings_batch_and_usage(texts)

Creating a Custom Embedder

Subclass Embedder and implement the embedding methods:
from definable.knowledge.embedders import Embedder

class LocalEmbedder(Embedder):
    dimensions: int = 384

    def get_embedding(self, text: str) -> list[float]:
        from sentence_transformers import SentenceTransformer
        model = SentenceTransformer("all-MiniLM-L6-v2")
        return model.encode(text).tolist()

    async def async_get_embedding(self, text: str) -> list[float]:
        return self.get_embedding(text)

Embedder Interface

MethodDescription
get_embedding(text) -> List[float]Get embedding synchronously
async_get_embedding(text) -> List[float]Get embedding asynchronously
get_embedding_and_usage(text)Get embedding with usage stats
async_get_embedding_and_usage(text)Async variant with usage stats
Make sure the dimensions on your embedder matches the dimensions on your vector database. Mismatched dimensions will cause errors during search.