Skip to main content
The Knowledge module provides a complete Retrieval-Augmented Generation (RAG) pipeline. Ingest documents from any source, chunk them into manageable pieces, generate embeddings, store them in a vector database, and retrieve the most relevant context at query time.

The RAG Pipeline

Ingestion (one-time)
StepOptions
Sourcefile, URL, string
ReaderPDF, text, URL
Chunkertext, recursive
EmbedderOpenAI, Voyage
VectorDBmemory, pgvector
Retrieval (per query)

Quick Start

from definable.knowledge import Knowledge, InMemoryVectorDB, OpenAIEmbedder

# Create a knowledge base
knowledge = Knowledge(
    vector_db=InMemoryVectorDB(),
    embedder=OpenAIEmbedder(),
)

# Add content
knowledge.add("Python is a programming language created by Guido van Rossum.")
knowledge.add("JavaScript is the language of the web.")

# Search
results = knowledge.search("Who created Python?")
for doc in results:
    print(f"[{doc.reranking_score or 'n/a'}] {doc.content}")

Adding Documents

The add() method accepts strings, file paths, or URLs. The appropriate reader is selected automatically:
# Plain text
knowledge.add("Definable is an AI agent framework.")

# From a text file
knowledge.add("/path/to/document.txt")

# From a PDF
knowledge.add("/path/to/report.pdf")

# From a URL
knowledge.add("https://example.com/article")

Searching

results = knowledge.search(
    query="What is Definable?",
    top_k=5,         # Number of results
    rerank=True,     # Apply reranking (if reranker is configured)
)
Returns a list of Document objects sorted by relevance.

Async Support

Every method has an async variant:
await knowledge.aadd("New content to index.")
results = await knowledge.asearch("Search query")

Components

Each step in the pipeline is pluggable:

Knowledge Parameters

vector_db
VectorDB
Vector database for storing and searching embeddings. Defaults to InMemoryVectorDB.
embedder
Embedder
Embedding provider for converting text to vectors.
reranker
Reranker
Optional reranker for improving search result relevance.
chunker
Chunker
Text chunker for splitting documents. Defaults to RecursiveChunker.
readers
List[Reader]
Document readers. Defaults include TextReader, PDFReader, URLReader.
auto_detect_reader
bool
default:"true"
Automatically detect the correct reader based on the source.

Managing Documents

# Add and get document IDs
doc_ids = knowledge.add("Some content")

# Remove specific documents
knowledge.remove(doc_ids)

# Clear everything
knowledge.clear()

# Count documents
print(len(knowledge))