Skip to main content
A Document represents a piece of text content along with its metadata, source information, and optional embedding vector. Documents are the fundamental unit that flows through the entire RAG pipeline.

Creating Documents

from definable.knowledge import Document

doc = Document(
    content="Definable is a Python framework for building AI agents.",
    name="intro",
    meta_data={"category": "overview", "version": "1.0"},
)

Document Fields

FieldTypeDescription
contentstrThe text content
idstrUnique identifier (auto-generated UUID)
namestrHuman-readable name
meta_datadictArbitrary metadata for filtering and display
embeddingList[float]Vector embedding (set by embedder)
sourcestrWhere the document came from (file path, URL)
source_typestrType of source ("text", "pdf", "url")
chunk_indexintIndex within a chunked document
chunk_totalintTotal number of chunks from the source
reranking_scorefloatRelevance score from reranking

Generating Embeddings

Embed a document manually using any embedder:
from definable.knowledge import OpenAIEmbedder

embedder = OpenAIEmbedder()
doc.embed(embedder)
print(len(doc.embedding))  # 1536
When using the Knowledge class, embedding is handled automatically during add(). You only need to embed manually if you are working with documents directly.

Serialization

Convert documents to and from dictionaries for storage or transmission:
# To dictionary
data = doc.to_dict()

# From dictionary
restored = Document.from_dict(data)

Metadata and Filtering

Attach metadata to documents for filtering during search:
knowledge.add(
    "Python 3.12 release notes...",
    # Metadata is attached to the document
)

# Filter during search (depends on vector DB support)
results = knowledge.search(
    "What's new in Python?",
    filter={"category": "release-notes"},
)

Chunked Documents

When a large document is chunked, each chunk is a separate Document with chunk tracking:
# After chunking a long document, you get:
# chunk.chunk_index = 0, chunk.chunk_total = 5
# chunk.chunk_index = 1, chunk.chunk_total = 5
# ...
This lets you reconstruct the original document order or display chunk context to users.