Skip to main content
Vector databases store document embeddings and enable fast similarity search. Definable includes seven implementations in the definable.vectordb module and a base class for custom backends.
All vector DB classes are imported from definable.vectordb, not definable.knowledge. The definable.knowledge module re-exports InMemoryVectorDB for backward compatibility but will show a deprecation warning.

InMemoryVectorDB

Stores everything in memory. Great for development, testing, and small datasets.
from definable.vectordb import InMemoryVectorDB

vector_db = InMemoryVectorDB(name="my_docs")
Characteristics:
  • No external dependencies (requires numpy)
  • Uses cosine similarity for search
  • Data is lost when the process exits

PgVector

Uses PostgreSQL with the pgvector extension. Suitable for production workloads with persistent storage and scalable search.
from definable.vectordb import PgVector

vector_db = PgVector(
    db_url="postgresql://user:pass@localhost:5432/mydb",
    table_name="documents",
)
Requires psycopg[binary] and pgvector. Install with:
pip install "psycopg[binary]" pgvector
Your PostgreSQL instance must have the pgvector extension enabled:
CREATE EXTENSION IF NOT EXISTS vector;

Qdrant

High-performance vector search engine.
from definable.vectordb import Qdrant

vector_db = Qdrant(
    url="localhost",
    port=6333,
    collection="my_docs",
    dimensions=1536,
)

ChromaDb

from definable.vectordb import ChromaDb

vector_db = ChromaDb(
    collection="my_docs",
    path="./chroma_data",  # Omit for in-memory mode
)

MongoDb

MongoDB Atlas vector search.
from definable.vectordb import MongoDb

vector_db = MongoDb(
    connection_string="mongodb+srv://...",
    database="mydb",
    collection="documents",
    dimensions=1536,
)

RedisDB

Redis with RediSearch for vector similarity.
from definable.vectordb import RedisDB

vector_db = RedisDB(
    url="redis://localhost:6379",
    index_name="my_docs",
    dimensions=1536,
)

PineconeDb

Pinecone managed vector database.
from definable.vectordb import PineconeDb

vector_db = PineconeDb(
    api_key="your-pinecone-api-key",
    index_name="my_docs",
    dimensions=1536,
)

Using with Knowledge

Pass any vector DB instance to Knowledge:
from definable.embedder import OpenAIEmbedder
from definable.knowledge import Knowledge
from definable.vectordb import InMemoryVectorDB

knowledge = Knowledge(
    vector_db=InMemoryVectorDB(),
    embedder=OpenAIEmbedder(),
)

VectorDB Interface

All implementations share the same base interface from definable.vectordb.VectorDB:
MethodDescription
create()Create the collection / table if it doesn’t exist
insert(content_hash, documents)Insert pre-embedded documents
upsert(content_hash, documents)Insert or update pre-embedded documents
search(query, limit, filters)Search by text query (backend embeds internally)
count() -> intNumber of stored documents
delete_by_id(id)Delete a document by its ID
delete()Delete the entire collection / table
drop()Drop the collection / table from the backend
ainsert(content_hash, documents)Async insert
asearch(query, limit, filters)Async search

Creating a Custom VectorDB

Subclass VectorDB from definable.vectordb to integrate any vector store. The key abstract methods to implement are:
from definable.vectordb import VectorDB
from definable.knowledge import Document

class MyVectorDB(VectorDB):
    def create(self) -> None:
        # Create collection/table if it doesn't exist
        ...

    async def async_create(self) -> None:
        self.create()

    def insert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        # Store pre-embedded documents
        ...

    async def async_insert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        self.insert(content_hash, documents, filters)

    def upsert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        self.insert(content_hash, documents, filters)

    async def async_upsert(self, content_hash: str, documents: list[Document], filters=None) -> None:
        self.upsert(content_hash, documents, filters)

    def search(self, query: str, limit: int = 5, filters=None) -> list[Document]:
        # Embed query and search
        ...

    async def async_search(self, query: str, limit: int = 5, filters=None) -> list[Document]:
        return self.search(query, limit, filters)

    def get_count(self) -> int:
        ...

    def delete(self) -> bool:
        # Delete the entire collection
        ...

    def delete_by_id(self, id: str) -> bool:
        ...

    def delete_by_name(self, name: str) -> bool:
        ...

    def delete_by_metadata(self, metadata: dict) -> bool:
        ...

    def delete_by_content_id(self, content_id: str) -> bool:
        ...

    def drop(self) -> None:
        ...

    async def async_drop(self) -> None:
        self.drop()

    def exists(self) -> bool:
        ...

    async def async_exists(self) -> bool:
        return self.exists()

    def name_exists(self, name: str) -> bool:
        ...

    def async_name_exists(self, name: str) -> bool:
        ...

    def id_exists(self, id: str) -> bool:
        ...

    def content_hash_exists(self, content_hash: str) -> bool:
        ...

    def get_supported_search_types(self) -> list[str]:
        return ["vector"]

Choosing a Vector Database

InMemoryVectorDBPgVectorQdrantChromaDbMongoDbRedisDBPineconeDb
SetupNonePostgreSQL + pgvectorQdrant serverNone (in-memory) or local dirMongoDB AtlasRedis + RediSearchManaged
PersistenceNoYesYesOptionalYesYesYes
ScaleThousandsMillionsMillionsThousands–millionsMillionsMillionsBillions
Best forDev, testingExisting PG infraHigh performanceLocal devExisting MongoLow latencyServerless