Skip to main content
Definable provides unified media types that work across all providers supporting multimodal input.

Images

Pass images from URLs, local files, or raw bytes:
from definable.models import OpenAIChat
from definable.media import Image

model = OpenAIChat(id="gpt-4o")

# From a URL
response = model.invoke(messages=[{
    "role": "user",
    "content": "Describe what you see.",
    "images": [Image(url="https://example.com/photo.jpg")],
}])
print(response.content)

Image Sources

image = Image(url="https://example.com/photo.jpg")

Detail Level

Control the resolution used for analysis:
image = Image(url="https://example.com/chart.png", detail="high")
# Options: "low", "high", "auto" (default)

Audio

Send audio input and receive audio output from supported models:

Audio Input

from definable.media import Audio

response = model.invoke(messages=[{
    "role": "user",
    "content": "Transcribe this audio.",
    "audio": [Audio(filepath="/path/to/recording.mp3")],
}])

Audio Output

model = OpenAIChat(
    id="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
)

response = model.invoke(
    messages=[{"role": "user", "content": "Read this sentence aloud: Hello world!"}]
)

# Access the audio output
print(response.audio.transcript)
audio_bytes = response.audio.get_content_bytes()

Video

Pass video files for analysis:
from definable.media import Video

response = model.invoke(messages=[{
    "role": "user",
    "content": "Summarize what happens in this video.",
    "videos": [Video(filepath="/path/to/clip.mp4")],
}])

Files

Send documents and other files:
from definable.media import File

response = model.invoke(messages=[{
    "role": "user",
    "content": "Summarize this document.",
    "files": [File(filepath="/path/to/report.pdf", mime_type="application/pdf")],
}])

Using Media with Agents

Agents accept media directly in the run() call:
from definable.agents import Agent
from definable.media import Image

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    instructions="You are a helpful visual assistant.",
)

output = agent.run(
    "What's in this image?",
    images=[Image(url="https://example.com/photo.jpg")],
)
print(output.content)

Supported Formats

TypeSupported Formats
ImageJPEG, PNG, GIF, WebP
AudioMP3, WAV, FLAC, OGG, M4A
VideoMP4, WebM
FilePDF, JSON, TXT, Python, Markdown, and more
Not all providers support all media types. Check your provider’s documentation for specific format and size limits.