Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.definable.ai/llms.txt

Use this file to discover all available pages before exploring further.

Definable provides unified media types that work across all providers supporting multimodal input.

Images

from definable.agent import Agent
from definable.media import Image

agent = Agent(model="gpt-4o", instructions="Describe images in detail.")

output = await agent.arun(
    "What do you see?",
    images=[Image(url="https://example.com/photo.jpg")],
)
print(output.content)

Image Sources

image = Image(url="https://example.com/photo.jpg")

Audio

from definable.media import Audio

output = await agent.arun(
    "Transcribe this audio.",
    audio=[Audio(filepath="/path/to/audio.mp3")],
)
Most models do not support raw audio input. Use audio_transcriber=True on the agent to automatically transcribe audio to text before the model sees it.

Files

from definable.media import File

output = await agent.arun(
    "Summarize this document.",
    files=[File(filepath="/path/to/report.pdf")],
)
When readers are enabled on the agent, file content is automatically extracted and injected into the prompt.

Video

from definable.media import Video

output = await agent.arun(
    "Describe what happens in this video.",
    videos=[Video(url="https://example.com/video.mp4")],
)

Voice Note Transcription

For Telegram/Discord voice messages, enable the audio transcriber:
agent = Agent(model="gpt-4o", audio_transcriber=True)  # Uses OpenAI Whisper
See Agent configuration for details.