The RAG Pipeline
Ingestion (one-time)| Step | Options |
|---|---|
| Source | file, URL, string |
| Reader | PDF, text, URL |
| Chunker | text, recursive |
| Embedder | OpenAI, Voyage |
| VectorDB | memory, pgvector |
Quick Start
Adding Documents
Theadd() method accepts strings, file paths, or URLs. The appropriate reader is selected automatically:
Searching
Document objects sorted by relevance.
Async Support
Every method has an async variant:Components
Each step in the pipeline is pluggable:Documents
The core data unit — text content with metadata and embeddings.
Readers
Read text, PDF, and web content into documents.
Chunkers
Split large documents into smaller, overlapping chunks.
Embedders
Generate vector embeddings with OpenAI or Voyage AI.
Rerankers
Rerank search results for higher relevance with Cohere.
Vector Databases
Store and search embeddings in memory or PostgreSQL.
Agent Integration
Connect knowledge to agents via middleware or toolkits.
Knowledge Parameters
Vector database for storing and searching embeddings. Defaults to
InMemoryVectorDB.Embedding provider for converting text to vectors.
Optional reranker for improving search result relevance.
Text chunker for splitting documents. Defaults to
RecursiveChunker.Document readers. Defaults include
TextReader, PDFReader, URLReader.Automatically detect the correct reader based on the source.