Skip to main content
Definable provides unified media types that work across all providers supporting multimodal input.

Images

Pass images from URLs, local files, or raw bytes:
from definable.model import OpenAIChat
from definable.model.message import Message
from definable.media import Image

model = OpenAIChat(id="gpt-4o")

# From a URL
response = model.invoke(
    messages=[Message(role="user", content="Describe what you see.", images=[Image(url="https://example.com/photo.jpg")])],
    assistant_message=Message(role="assistant", content=""),
)
print(response.content)

Image Sources

image = Image(url="https://example.com/photo.jpg")

Detail Level

Control the resolution used for analysis:
image = Image(url="https://example.com/chart.png", detail="high")
# Options: "low", "high", "auto" (default)

Audio

Send audio input and receive audio output from supported models.

Audio Input

from definable.model.message import Message
from definable.media import Audio

response = model.invoke(
    messages=[Message(role="user", content="Transcribe this audio.", audio=[Audio(filepath="/path/to/recording.mp3")])],
    assistant_message=Message(role="assistant", content=""),
)
Only gpt-4o-audio-preview supports raw input_audio blocks. Other models (GPT-4o-mini, Claude, DeepSeek) will reject them. Use audio_transcriber=True on the Agent to automatically transcribe voice to text before it reaches the model — see Voice Note Transcription below.

Audio Output

from definable.model.message import Message

model = OpenAIChat(
    id="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
)

response = model.invoke(
    messages=[Message(role="user", content="Read this sentence aloud: Hello world!")],
    assistant_message=Message(role="assistant", content=""),
)

# Access the audio output
print(response.audio.transcript)
audio_bytes = response.audio.get_content_bytes()

Voice Note Transcription

When your agent receives voice messages from Telegram, Discord, or other interfaces, you need to transcribe the audio to text before sending it to the model. The audio_transcriber parameter handles this automatically.
from definable.agent import Agent

# Transcribe voice notes using OpenAI Whisper (default)
agent = Agent(
    model="openai/gpt-4o-mini",
    audio_transcriber=True,
)
When audio_transcriber is set:
  1. Voice messages arrive as Audio objects on the message
  2. Each audio clip is sent to the Whisper API for transcription
  3. The transcript text is appended to the message content
  4. The audio field is cleared so non-audio models don’t receive raw audio blocks
from definable.agent import Agent

agent = Agent(model="openai/gpt-4o-mini", audio_transcriber=True)
audio_transcriber works at the agent level, so all interfaces (Telegram, Discord, Desktop) and direct arun() callers benefit automatically. No per-interface setup required.

Format Normalization

Telegram sends voice notes as .oga (OGG Opus), which isn’t accepted by all APIs. The reader.audio module provides format normalization:
from definable.reader.audio import normalize_audio_format

# Automatically converts oga/ogg/opus → wav via ffmpeg
out_bytes, out_fmt = normalize_audio_format(raw_bytes, "oga")
# out_fmt == "wav"
This requires ffmpeg installed on your system. The audio_transcriber handles this transparently — you don’t need to call normalize_audio_format directly when using it.

Video

Pass video files for analysis:
from definable.model.message import Message
from definable.media import Video

response = model.invoke(
    messages=[Message(role="user", content="Summarize what happens in this video.", videos=[Video(filepath="/path/to/clip.mp4")])],
    assistant_message=Message(role="assistant", content=""),
)

Files

Send documents and other files:
from definable.model.message import Message
from definable.media import File

response = model.invoke(
    messages=[Message(role="user", content="Summarize this document.", files=[File(filepath="/path/to/report.pdf", mime_type="application/pdf")])],
    assistant_message=Message(role="assistant", content=""),
)

Using Media with Agents

Agents accept media directly in the run() call:
from definable.agent import Agent
from definable.media import Image

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    instructions="You are a helpful visual assistant.",
)

output = agent.run(
    "What's in this image?",
    images=[Image(url="https://example.com/photo.jpg")],
)
print(output.content)

Supported Formats

TypeSupported Formats
ImageJPEG, PNG, GIF, WebP
AudioMP3, WAV, FLAC, OGG, M4A
VideoMP4, WebM
FilePDF, JSON, TXT, Python, Markdown, and more
Not all providers support all media types. Check your provider’s documentation for specific format and size limits.