Document objects that can be chunked, embedded, and stored. Definable includes readers for plain text, PDF, and web content.
These are Knowledge readers (
definable.knowledge.readers) for the RAG document ingestion pipeline. If you need to extract text from files attached to agent messages (PDF, DOCX, XLSX, audio) before LLM processing, see File Readers instead.Auto-Detection
By default,Knowledge detects the correct reader from the source:
TextReader
Reads plain text files (.txt, .md, .rst, .csv, .log).
PDFReader
Reads PDF files page by page.Requires the
pypdf package. Install it with pip install pypdf.URLReader
Fetches and extracts text content from web pages.Requires
httpx (included) and beautifulsoup4. Install with pip install beautifulsoup4.Specifying a Reader
Override auto-detection by passing a reader explicitly:Async Reading
All readers support async:Creating a Custom Reader
SubclassReader and implement read() and optionally can_read():
Reader Interface
All readers implement:| Method | Description |
|---|---|
read(source) -> List[Document] | Read documents synchronously |
aread(source) -> List[Document] | Read documents asynchronously |
can_read(source) -> bool | Check if this reader can handle the source |