Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.definable.ai/llms.txt

Use this file to discover all available pages before exploring further.

Vector databases store document embeddings and enable fast similarity search. Definable includes seven implementations in the definable.vectordb module and a base class for custom backends.
All vector DB classes are imported from definable.vectordb, not definable.knowledge. The definable.knowledge module re-exports InMemoryVectorDB for backward compatibility but will show a deprecation warning.

InMemoryVectorDB

Stores everything in memory. Great for development, testing, and small datasets.
from definable.vectordb import InMemoryVectorDB

vector_db = InMemoryVectorDB(name="my_docs")
Characteristics:
  • No external dependencies (requires numpy)
  • Uses cosine similarity for search
  • Data is lost when the process exits

PgVector

Uses PostgreSQL with the pgvector extension. Suitable for production workloads with persistent storage and scalable search.
from definable.vectordb import PgVector

vector_db = PgVector(
    db_url="postgresql://user:pass@localhost:5432/mydb",
    table_name="documents",
)
Requires psycopg[binary] and pgvector. Install with:
pip install "psycopg[binary]" pgvector
Your PostgreSQL instance must have the pgvector extension enabled:
CREATE EXTENSION IF NOT EXISTS vector;

Qdrant

High-performance vector search engine.
from definable.vectordb import Qdrant

vector_db = Qdrant(
    url="localhost",
    port=6333,
    collection="my_docs",
    dimensions=1536,
)

ChromaDb

from definable.vectordb import ChromaDb

vector_db = ChromaDb(
    collection="my_docs",
    path="./chroma_data",  # Omit for in-memory mode
)

MongoDb

MongoDB Atlas vector search.
from definable.vectordb import MongoDb

vector_db = MongoDb(
    connection_string="mongodb+srv://...",
    database="mydb",
    collection="documents",
    dimensions=1536,
)

RedisDB

Redis with RediSearch for vector similarity.
from definable.vectordb import RedisDB

vector_db = RedisDB(
    url="redis://localhost:6379",
    index_name="my_docs",
    dimensions=1536,
)

PineconeDb

Pinecone managed vector database.
from definable.vectordb import PineconeDb

vector_db = PineconeDb(
    api_key="your-pinecone-api-key",
    index_name="my_docs",
    dimensions=1536,
)

Using with Knowledge

Pass any vector DB instance to Knowledge:
from definable.embedder import OpenAIEmbedder
from definable.knowledge import Knowledge
from definable.vectordb import InMemoryVectorDB

knowledge = Knowledge(
    vector_db=InMemoryVectorDB(),
    embedder=OpenAIEmbedder(),
)

VectorDB Interface

All implementations share the same base interface from definable.vectordb.VectorDB:
MethodDescription
create()Create the collection / table if it doesn’t exist
insert(content_hash, documents)Insert pre-embedded documents
upsert(content_hash, documents)Insert or update pre-embedded documents
search(query, limit, filters)Search by text query (backend embeds internally)
count() -> intNumber of stored documents
delete_by_id(id)Delete a document by its ID
delete()Delete the entire collection / table
drop()Drop the collection / table from the backend
ainsert(content_hash, documents)Async insert
asearch(query, limit, filters)Async search

Creating a Custom VectorDB

Subclass VectorDB from definable.vectordb to integrate any vector store. The key abstract methods to implement are:
from definable.vectordb import VectorDB
from definable.knowledge import Document

class MyVectorDB(VectorDB):
    def create(self) -> None:
        # Create collection/table if it doesn't exist
        ...

    async def async_create(self) -> None:
        self.create()

    def insert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        # Store pre-embedded documents
        ...

    async def async_insert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        self.insert(content_hash, documents, filters)

    def upsert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        self.insert(content_hash, documents, filters)

    async def async_upsert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        self.upsert(content_hash, documents, filters)

    def search(self, query: str, limit: int = 5, filters=None) -> list[Document]:
        # Embed query and search
        ...

    async def async_search(self, query: str, limit: int = 5, filters=None) -> list[Document]:
        return self.search(query, limit, filters)

    def get_count(self) -> int:
        ...

    def delete(self) -> bool:
        # Delete the entire collection
        ...

    def delete_by_id(self, id: str) -> bool:
        ...

    def delete_by_name(self, name: str) -> bool:
        ...

    def delete_by_metadata(self, metadata: dict) -> bool:
        ...

    def delete_by_content_id(self, content_id: str) -> bool:
        ...

    def drop(self) -> None:
        ...

    async def async_drop(self) -> None:
        self.drop()

    def exists(self) -> bool:
        ...

    async def async_exists(self) -> bool:
        return self.exists()

    def name_exists(self, name: str) -> bool:
        ...

    def async_name_exists(self, name: str) -> bool:
        ...

    def id_exists(self, id: str) -> bool:
        ...

    def content_hash_exists(self, content_hash: str) -> bool:
        ...

    def get_supported_search_types(self) -> list[str]:
        return ["vector"]

Choosing a Vector Database

InMemoryVectorDBPgVectorQdrantChromaDbMongoDbRedisDBPineconeDb
SetupNonePostgreSQL + pgvectorQdrant serverNone (in-memory) or local dirMongoDB AtlasRedis + RediSearchManaged
PersistenceNoYesYesOptionalYesYesYes
ScaleThousandsMillionsMillionsThousands–millionsMillionsMillionsBillions
Best forDev, testingExisting PG infraHigh performanceLocal devExisting MongoLow latencyServerless