from definable.model import OpenAIChatfrom definable.model.message import Messagefrom definable.media import Imagemodel = OpenAIChat(id="gpt-4o")# From a URLresponse = model.invoke( messages=[Message(role="user", content="Describe what you see.", images=[Image(url="https://example.com/photo.jpg")])], assistant_message=Message(role="assistant", content=""),)print(response.content)
from definable.model.message import Messagefrom definable.media import Audioresponse = model.invoke( messages=[Message(role="user", content="Transcribe this audio.", audio=[Audio(filepath="/path/to/recording.mp3")])], assistant_message=Message(role="assistant", content=""),)
Only gpt-4o-audio-preview supports raw input_audio blocks. Other models (GPT-4o-mini, Claude, DeepSeek) will reject them. Use audio_transcriber=True on the Agent to automatically transcribe voice to text before it reaches the model — see Voice Note Transcription below.
When your agent receives voice messages from Telegram, Discord, or other interfaces, you need to transcribe the audio to text before sending it to the model. The audio_transcriber parameter handles this automatically.
from definable.agent import Agent# Transcribe voice notes using OpenAI Whisper (default)agent = Agent( model="openai/gpt-4o-mini", audio_transcriber=True,)
When audio_transcriber is set:
Voice messages arrive as Audio objects on the message
Each audio clip is sent to the Whisper API for transcription
The transcript text is appended to the message content
The audio field is cleared so non-audio models don’t receive raw audio blocks
from definable.agent import Agentagent = Agent(model="openai/gpt-4o-mini", audio_transcriber=True)
audio_transcriber works at the agent level, so all interfaces (Telegram, Discord, Desktop) and direct arun() callers benefit automatically. No per-interface setup required.
Telegram sends voice notes as .oga (OGG Opus), which isn’t accepted by all APIs. The reader.audio module provides format normalization:
from definable.reader.audio import normalize_audio_format# Automatically converts oga/ogg/opus → wav via ffmpegout_bytes, out_fmt = normalize_audio_format(raw_bytes, "oga")# out_fmt == "wav"
This requires ffmpeg installed on your system. The audio_transcriber handles this transparently — you don’t need to call normalize_audio_format directly when using it.
from definable.model.message import Messagefrom definable.media import Videoresponse = model.invoke( messages=[Message(role="user", content="Summarize what happens in this video.", videos=[Video(filepath="/path/to/clip.mp4")])], assistant_message=Message(role="assistant", content=""),)
from definable.agent import Agentfrom definable.media import Imageagent = Agent( model=OpenAIChat(id="gpt-4o"), instructions="You are a helpful visual assistant.",)output = agent.run( "What's in this image?", images=[Image(url="https://example.com/photo.jpg")],)print(output.content)