Chunking splits large documents into smaller pieces that fit within embedding model limits and improve retrieval precision. Smaller, focused chunks tend to match queries more accurately than entire documents.Documentation Index
Fetch the complete documentation index at: https://docs.definable.ai/llms.txt
Use this file to discover all available pages before exploring further.
Why Chunk?
- Embedding models have token limits — most accept 512-8192 tokens
- Smaller chunks are more precise — a paragraph about “authentication” matches better than a full page with mixed topics
- Overlap preserves context — overlapping boundaries prevent information loss at chunk edges
TextChunker
Splits text on a single separator (e.g., double newlines for paragraphs):Target size for each chunk in characters.
Number of characters to overlap between adjacent chunks.
The separator to split on.
Whether to keep the separator in chunk content.
RecursiveChunker
Splits text using a hierarchy of separators, falling back to finer-grained splits when chunks are too large. This is the default chunker and generally produces the best results.Ordered list of separators to try. The chunker uses the first separator that produces chunks within the size limit, then recurses with finer separators for any chunks that are still too large.
\n\n— paragraph breaks (preferred)\n— line breaks.— sentence endings— word boundaries""— character-level split (last resort)
Using with Knowledge
Pass a chunker when creating a knowledge base:Disabling Chunking
If your documents are already the right size, skip chunking:Chunker Interface
Both chunkers implement:| Method | Description |
|---|---|
chunk(document) -> List[Document] | Chunk a single document |
chunk_many(documents) -> List[Document] | Chunk multiple documents |