Skip to main content
The Knowledge module provides a complete Retrieval-Augmented Generation (RAG) pipeline. Ingest documents from any source, chunk them into manageable pieces, generate embeddings, store them in a vector database, and retrieve the most relevant context at query time.

The RAG Pipeline

Ingestion (one-time)
StepOptions
Sourcefile, URL, string
ReaderPDF, text, URL
Chunkertext, recursive
EmbedderOpenAI, Voyage
VectorDBInMemory, PgVector, Qdrant, ChromaDb, MongoDb, RedisDB, PineconeDb
Retrieval (per query)
Hybrid merge, temporal decay, and MMR are optional. Without them, the pipeline is simply Vector Search → Reranker → Results. See Hybrid Search & Scoring for details.

Quick Start

from definable.embedder import OpenAIEmbedder
from definable.knowledge import Knowledge
from definable.vectordb import InMemoryVectorDB

# Create a knowledge base
knowledge = Knowledge(
    vector_db=InMemoryVectorDB(),
    embedder=OpenAIEmbedder(),
)

# Add content
knowledge.add("Python is a programming language created by Guido van Rossum.")
knowledge.add("JavaScript is the language of the web.")

# Search
results = knowledge.search("Who created Python?")
for doc in results:
    print(f"[{doc.reranking_score or 'n/a'}] {doc.content}")

Path Shorthand

For the quickest setup, pass a directory path directly to the Agent:
from definable.agent import Agent

agent = Agent(model="gpt-4o-mini", knowledge="./docs/")
This auto-configures InMemoryVectorDB + OpenAIEmbedder + RecursiveChunker and recursively loads all supported files. See Agent Integration for details.

Adding Documents

The add() method accepts strings, file paths, or URLs. The appropriate reader is selected automatically:
# Plain text
knowledge.add("Definable is an AI agent framework.")

# From a text file
knowledge.add("/path/to/document.txt")

# From a PDF
knowledge.add("/path/to/report.pdf")

# From a URL
knowledge.add("https://example.com/article")

Searching

results = knowledge.search(
    query="What is Definable?",
    top_k=5,         # Number of results
    rerank=True,     # Apply reranking (if reranker is configured)
)
Returns a list of Document objects sorted by relevance.

Async Support

Every method has an async variant:
await knowledge.aadd("New content to index.")
results = await knowledge.asearch("Search query")

Components

Each step in the pipeline is pluggable:

Documents

The core data unit — text content with metadata and embeddings.

Readers

Read text, PDF, and web content into documents.

Chunkers

Split large documents into smaller, overlapping chunks.

Embedders

Generate vector embeddings with OpenAI, Voyage AI, Google, or Mistral.

Rerankers

Rerank search results for higher relevance with Cohere or SentenceTransformer.

Vector Databases

Store and search embeddings with seven backends: InMemory, PgVector, Qdrant, ChromaDb, MongoDb, RedisDB, and PineconeDb.

Hybrid Search

Combine vector + full-text search, MMR diversity, and temporal decay.

Agent Integration

Connect knowledge to agents via middleware or toolkits.

Knowledge Parameters

vector_db
VectorDB
Vector database for storing and searching embeddings. Defaults to InMemoryVectorDB.
embedder
Embedder
Embedding provider for converting text to vectors.
reranker
Reranker
Optional reranker for improving search result relevance.
chunker
Chunker
Text chunker for splitting documents. When None (default), documents are not chunked automatically. Pass a RecursiveChunker or TextChunker instance to enable chunking during add().
readers
List[Reader]
Document readers. Defaults include TextReader, PDFReader, URLReader.
auto_detect_reader
bool
default:"true"
Automatically detect the correct reader based on the source.
fts_index
FTSIndex
Full-text search index for hybrid vector + keyword search. See Hybrid Search.
hybrid_config
HybridSearchConfig
Configuration for merging vector and full-text search results (weights, merge strategy).
temporal_decay
TemporalDecay
Exponential score decay based on document age. See Hybrid Search.
mmr
MMRConfig
Maximal Marginal Relevance for diversity reranking. See Hybrid Search.

Managing Documents

# Add and get document IDs
doc_ids = knowledge.add("Some content")

# Remove specific documents
knowledge.remove(doc_ids)

# Clear everything
knowledge.clear()

# Count documents
print(len(knowledge))