Skip to main content
The Call interface connects your agent to voice phone calls. It supports multiple telephony providers (Twilio, Plivo) and three pipeline strategies for different latency/flexibility trade-offs.

Architecture

Three pipeline modes handle the voice-to-text-to-voice conversion differently:
ModeHow It WorksLatencyProvider Lock
ManagedTwilio ConversationRelay handles STT/TTS~500msTwilio only
CascadingYour STT → Agent → Your TTS~800-1200msAny provider
RealtimeOpenAI speech-to-speech (no text step)~200-300msOpenAI model

Setup

Install

pip install 'definable[call]'
This installs the websockets dependency needed for real-time audio streaming.

Twilio Setup

  1. Create a Twilio account and get a phone number
  2. Set environment variables:
export TWILIO_ACCOUNT_SID="AC..."
export TWILIO_AUTH_TOKEN="..."
export OPENAI_API_KEY="sk-..."
  1. Configure your Twilio phone number’s webhook:
    • Go to Phone Numbers → Manage → Active Numbers
    • Set A call comes in to your server URL: https://your-domain.com/call/incoming (POST)

Plivo Setup

  1. Create a Plivo account and get a phone number
  2. Set environment variables:
export PLIVO_AUTH_ID="MA..."
export PLIVO_AUTH_TOKEN="..."
export OPENAI_API_KEY="sk-..."
  1. Create a Plivo Application:
    • Go to Voice → Applications → New Application
    • Set Answer URL to: https://your-domain.com/call/incoming (POST)
    • Assign your Plivo phone number to this application
For local development, use ngrok to expose your local server: ngrok http 8000

Quick Start — Managed Mode (Twilio)

The simplest mode. Twilio handles STT and TTS via ConversationRelay — you just provide text.
import asyncio
from definable.agent import Agent
from definable.agent.interface.call import CallInterface
from definable.agent.runtime import AgentRuntime

agent = Agent(
    model="openai/gpt-4o-mini",
    instructions="You are a helpful phone agent. Keep responses concise.",
)

call = CallInterface(
    agent=agent,
    provider="twilio",
    phone_number="+15551234567",
    pipeline="managed",
    welcome_message="Hello! How can I help you today?",
)

runtime = AgentRuntime(agent, interfaces=[call], host="0.0.0.0", port=8000)
asyncio.run(runtime.start())
Call your Twilio number and start talking.

Quick Start — Cascading Mode

Full control over STT and TTS providers. Works with both Twilio and Plivo.
import asyncio
from definable.agent import Agent
from definable.agent.interface.call import CallInterface
from definable.agent.interface.call.stt.deepgram import DeepgramSTT
from definable.agent.interface.call.tts.cartesia import CartesiaTTS
from definable.agent.runtime import AgentRuntime

stt = DeepgramSTT(model="nova-3", language="en-US", endpointing=300)
tts = CartesiaTTS(model="sonic-2", voice_id="a0e99841-438c-4a64-b679-ae501e7d6091")

agent = Agent(
    model="openai/gpt-4o-mini",
    instructions="You are a helpful phone agent. Keep responses concise.",
)

call = CallInterface(
    agent=agent,
    provider="twilio",  # or "plivo"
    phone_number="+15551234567",
    pipeline="cascading",
    stt=stt,
    tts=tts,
    welcome_message="Hello! How can I help you today?",
)

runtime = AgentRuntime(agent, interfaces=[call], host="0.0.0.0", port=8000)
asyncio.run(runtime.start())

Quick Start — Realtime Mode

Lowest latency using OpenAI’s speech-to-speech Realtime API. Audio flows directly to the model with no intermediate text step.
import asyncio
from definable.agent import Agent
from definable.agent.interface.call import CallInterface, OpenAIRealtimeProvider
from definable.agent.runtime import AgentRuntime

realtime = OpenAIRealtimeProvider(
    model="gpt-4o-realtime-preview",
    voice="alloy",
)

agent = Agent(
    model="openai/gpt-4o-mini",  # used for non-audio tasks
    instructions="You are a helpful phone agent.",
)

call = CallInterface(
    agent=agent,
    provider="twilio",
    phone_number="+15551234567",
    pipeline="realtime",
    realtime=realtime,
    welcome_message="Hello! How can I help?",
)

runtime = AgentRuntime(agent, interfaces=[call], host="0.0.0.0", port=8000)
asyncio.run(runtime.start())

CallInterface Parameters

Telephony

provider
str
default:"twilio"
Telephony provider: "twilio" or "plivo".
phone_number
str
required
Phone number to receive calls on (E.164 format, e.g. "+15551234567").
account_sid
str
Twilio Account SID. Falls back to TWILIO_ACCOUNT_SID env var.
auth_token
str
Twilio or Plivo auth token. Falls back to TWILIO_AUTH_TOKEN or PLIVO_AUTH_TOKEN env var.
auth_id
str
Plivo Auth ID. Falls back to PLIVO_AUTH_ID env var.

Pipeline

pipeline
str
default:"managed"
Voice pipeline mode: "managed", "cascading", or "realtime".
stt
STTProvider
Speech-to-text provider for cascading mode. Required when pipeline="cascading".
tts
TTSProvider
Text-to-speech provider for cascading mode. Required when pipeline="cascading".
realtime
RealtimeProvider
Realtime provider for speech-to-speech mode. Required when pipeline="realtime".

Voice Settings

welcome_message
str
Greeting spoken when a call connects.
voice
str
default:"en-US-Standard-A"
Voice name or ID for TTS synthesis.
language
str
default:"en-US"
BCP-47 language code.
interruptible
str
default:"any"
When the caller can interrupt: "none", "dtmf", "speech", or "any".
interrupt_sensitivity
str
default:"medium"
Barge-in sensitivity: "low", "medium", or "high".

Managed Mode Settings

stt_provider
str
default:"deepgram"
STT provider name for managed mode (Twilio ConversationRelay).
tts_provider
str
default:"google"
TTS provider name for managed mode (Twilio ConversationRelay).

Server Paths

webhook_path
str
default:"/call/incoming"
URL path for the incoming call webhook.
stream_path
str
default:"/call/stream"
URL path for WebSocket audio streams.

Call Settings

max_call_duration_seconds
int
default:"3600"
Maximum call duration before automatic hangup (1 hour default).

Pipeline Modes

Managed (Twilio Only)

The telephony provider handles STT and TTS natively. Your agent only sees text.
Caller speaks → Twilio STT → text → Agent → text → Twilio TTS → Caller hears
  • Simplest to set up — no STT/TTS provider configuration needed
  • ~500ms latency
  • Limited to Twilio (uses ConversationRelay)
  • Provider-dependent voice/model selection
Plivo does not support managed mode — it has no ConversationRelay equivalent. Use pipeline="cascading" or pipeline="realtime" with Plivo.

Cascading

Raw audio flows through your own STT and TTS providers. Full control over every component.
Caller speaks → Twilio/Plivo → raw audio → STT → text
→ Agent → text → TTS → audio → Twilio/Plivo → Caller hears
  • Works with both Twilio and Plivo
  • Pluggable STT (DeepgramSTT) and TTS (CartesiaTTS)
  • Automatic barge-in detection (speech during playback)
  • ~800-1200ms latency

Realtime (OpenAI)

Audio flows directly to OpenAI’s Realtime API for speech-to-speech processing. No intermediate text conversion step.
Caller speaks → Twilio/Plivo → raw audio → OpenAI Realtime API → audio → Caller hears
  • Lowest latency (~200-300ms)
  • Native function calling (tools work without text intermediary)
  • Server-side VAD (voice activity detection)
  • Locked to OpenAI Realtime models (gpt-4o-realtime-preview)

Telephony Providers

Twilio

Supports all three pipeline modes. Uses Media Streams for cascading/realtime and ConversationRelay for managed mode.
call = CallInterface(
    provider="twilio",
    account_sid="AC...",    # or TWILIO_ACCOUNT_SID env var
    auth_token="...",       # or TWILIO_AUTH_TOKEN env var
    phone_number="+15551234567",
    pipeline="managed",     # or "cascading" or "realtime"
)

Plivo

Supports cascading and realtime modes only. Uses bidirectional Audio Streaming over WebSocket.
call = CallInterface(
    provider="plivo",
    auth_id="MA...",        # or PLIVO_AUTH_ID env var
    auth_token="...",       # or PLIVO_AUTH_TOKEN env var
    phone_number="+15551234567",
    pipeline="cascading",   # or "realtime" (NOT "managed")
    stt=DeepgramSTT(...),
    tts=CartesiaTTS(...),
)
Key differences from Twilio:
  • No managed mode (no ConversationRelay equivalent)
  • Supports 16kHz PCM natively (Twilio only supports 8kHz mu-law)
  • Uses HMAC-SHA256 V3 for webhook signatures (Twilio uses HMAC-SHA1)

STT Providers

DeepgramSTT

Real-time streaming transcription via Deepgram’s WebSocket API.
from definable.agent.interface.call.stt.deepgram import DeepgramSTT

stt = DeepgramSTT(
    api_key="...",            # or DEEPGRAM_API_KEY env var
    model="nova-3",           # Deepgram model
    language="en-US",
    endpointing=300,          # silence detection (ms)
    smart_format=True,        # format numbers, dates, etc.
    utterance_end_ms=1000,    # utterance boundary detection
)

TTS Providers

CartesiaTTS

Ultra-low latency streaming TTS via Cartesia’s WebSocket API (40-90ms TTFB).
from definable.agent.interface.call.tts.cartesia import CartesiaTTS

tts = CartesiaTTS(
    api_key="...",            # or CARTESIA_API_KEY env var
    model="sonic-2",
    voice_id="a0e99841-...",  # Cartesia voice ID
    language="en",
)

Agent with Tools

Give your phone agent capabilities:
from definable.agent import Agent
from definable.tool.decorator import tool

@tool
def check_order(order_id: str) -> str:
    """Look up an order by ID."""
    return f"Order {order_id}: Shipped, arriving March 15."

@tool
def transfer_call(department: str) -> str:
    """Transfer the caller to a department."""
    return f"Transferring to {department}. Please hold."

agent = Agent(
    model="openai/gpt-4o-mini",
    instructions=(
        "You are a customer service agent for Acme Corp. "
        "Keep responses concise — you're on a phone call."
    ),
    tools=[check_order, transfer_call],
)

Deployment

CallInterface integrates with AgentRuntime, which provides a shared FastAPI server for webhooks and WebSocket connections.
from definable.agent.runtime import AgentRuntime

runtime = AgentRuntime(
    agent,
    interfaces=[call],
    host="0.0.0.0",
    port=8000,
)
await runtime.start()
For production:
  • Use a reverse proxy (nginx, Caddy) with TLS termination
  • Point your telephony provider’s webhook to https://your-domain.com/call/incoming
  • The WebSocket endpoint is at wss://your-domain.com/call/stream
  • Set max_call_duration_seconds to prevent runaway calls