Vector databases store document embeddings and enable fast similarity search. Definable includes seven implementations in the definable.vectordb module and a base class for custom backends.
All vector DB classes are imported from definable.vectordb, not definable.knowledge. The definable.knowledge module re-exports InMemoryVectorDB for backward compatibility but will show a deprecation warning.
InMemoryVectorDB
Stores everything in memory. Great for development, testing, and small datasets.
from definable.vectordb import InMemoryVectorDB
vector_db = InMemoryVectorDB(name="my_docs")
Characteristics:
- No external dependencies (requires
numpy)
- Uses cosine similarity for search
- Data is lost when the process exits
PgVector
Uses PostgreSQL with the pgvector extension. Suitable for production workloads with persistent storage and scalable search.
from definable.vectordb import PgVector
vector_db = PgVector(
db_url="postgresql://user:pass@localhost:5432/mydb",
table_name="documents",
)
Requires psycopg[binary] and pgvector. Install with:pip install "psycopg[binary]" pgvector
Your PostgreSQL instance must have the pgvector extension enabled:CREATE EXTENSION IF NOT EXISTS vector;
Qdrant
High-performance vector search engine.
from definable.vectordb import Qdrant
vector_db = Qdrant(
url="localhost",
port=6333,
collection="my_docs",
dimensions=1536,
)
ChromaDb
from definable.vectordb import ChromaDb
vector_db = ChromaDb(
collection="my_docs",
path="./chroma_data", # Omit for in-memory mode
)
MongoDb
MongoDB Atlas vector search.
from definable.vectordb import MongoDb
vector_db = MongoDb(
connection_string="mongodb+srv://...",
database="mydb",
collection="documents",
dimensions=1536,
)
RedisDB
Redis with RediSearch for vector similarity.
from definable.vectordb import RedisDB
vector_db = RedisDB(
url="redis://localhost:6379",
index_name="my_docs",
dimensions=1536,
)
PineconeDb
Pinecone managed vector database.
from definable.vectordb import PineconeDb
vector_db = PineconeDb(
api_key="your-pinecone-api-key",
index_name="my_docs",
dimensions=1536,
)
Using with Knowledge
Pass any vector DB instance to Knowledge:
from definable.embedder import OpenAIEmbedder
from definable.knowledge import Knowledge
from definable.vectordb import InMemoryVectorDB
knowledge = Knowledge(
vector_db=InMemoryVectorDB(),
embedder=OpenAIEmbedder(),
)
VectorDB Interface
All implementations share the same base interface from definable.vectordb.VectorDB:
| Method | Description |
|---|
create() | Create the collection / table if it doesn’t exist |
insert(content_hash, documents) | Insert pre-embedded documents |
upsert(content_hash, documents) | Insert or update pre-embedded documents |
search(query, limit, filters) | Search by text query (backend embeds internally) |
count() -> int | Number of stored documents |
delete_by_id(id) | Delete a document by its ID |
delete() | Delete the entire collection / table |
drop() | Drop the collection / table from the backend |
ainsert(content_hash, documents) | Async insert |
asearch(query, limit, filters) | Async search |
Creating a Custom VectorDB
Subclass VectorDB from definable.vectordb to integrate any vector store. The key abstract methods to implement are:
from definable.vectordb import VectorDB
from definable.knowledge import Document
class MyVectorDB(VectorDB):
def create(self) -> None:
# Create collection/table if it doesn't exist
...
async def async_create(self) -> None:
self.create()
def insert(self, content_hash: str, documents: list[Document], filters=None) -> None:
# Store pre-embedded documents
...
async def async_insert(self, content_hash: str, documents: list[Document], filters=None) -> None:
self.insert(content_hash, documents, filters)
def upsert(self, content_hash: str, documents: list[Document], filters=None) -> None:
self.insert(content_hash, documents, filters)
async def async_upsert(self, content_hash: str, documents: list[Document], filters=None) -> None:
self.upsert(content_hash, documents, filters)
def search(self, query: str, limit: int = 5, filters=None) -> list[Document]:
# Embed query and search
...
async def async_search(self, query: str, limit: int = 5, filters=None) -> list[Document]:
return self.search(query, limit, filters)
def get_count(self) -> int:
...
def delete(self) -> bool:
# Delete the entire collection
...
def delete_by_id(self, id: str) -> bool:
...
def delete_by_name(self, name: str) -> bool:
...
def delete_by_metadata(self, metadata: dict) -> bool:
...
def delete_by_content_id(self, content_id: str) -> bool:
...
def drop(self) -> None:
...
async def async_drop(self) -> None:
self.drop()
def exists(self) -> bool:
...
async def async_exists(self) -> bool:
return self.exists()
def name_exists(self, name: str) -> bool:
...
def async_name_exists(self, name: str) -> bool:
...
def id_exists(self, id: str) -> bool:
...
def content_hash_exists(self, content_hash: str) -> bool:
...
def get_supported_search_types(self) -> list[str]:
return ["vector"]
Choosing a Vector Database
| InMemoryVectorDB | PgVector | Qdrant | ChromaDb | MongoDb | RedisDB | PineconeDb |
|---|
| Setup | None | PostgreSQL + pgvector | Qdrant server | None (in-memory) or local dir | MongoDB Atlas | Redis + RediSearch | Managed |
| Persistence | No | Yes | Yes | Optional | Yes | Yes | Yes |
| Scale | Thousands | Millions | Millions | Thousands–millions | Millions | Millions | Billions |
| Best for | Dev, testing | Existing PG infra | High performance | Local dev | Existing Mongo | Low latency | Serverless |