Call (Voice) - Definable AI

The Call interface connects your agent to voice phone calls. It supports multiple telephony providers (Twilio, Plivo) and three pipeline strategies for different latency/flexibility trade-offs.

Architecture

Three pipeline modes handle the voice-to-text-to-voice conversion differently:

Mode	How It Works	Latency	Provider Lock
Managed	Twilio ConversationRelay handles STT/TTS	~500ms	Twilio only
Cascading	Your STT → Agent → Your TTS	~800-1200ms	Any provider
Realtime	OpenAI speech-to-speech (no text step)	~200-300ms	OpenAI model

Setup

Install

pip install 'definable[call]'

This installs the websockets dependency needed for real-time audio streaming.

Twilio Setup

Create a Twilio account and get a phone number
Set environment variables:

export TWILIO_ACCOUNT_SID="AC..."
export TWILIO_AUTH_TOKEN="..."
export OPENAI_API_KEY="sk-..."

Configure your Twilio phone number’s webhook:
- Go to Phone Numbers → Manage → Active Numbers
- Set A call comes in to your server URL: https://your-domain.com/call/incoming (POST)

Plivo Setup

Create a Plivo account and get a phone number
Set environment variables:

export PLIVO_AUTH_ID="MA..."
export PLIVO_AUTH_TOKEN="..."
export OPENAI_API_KEY="sk-..."

Create a Plivo Application:
- Go to Voice → Applications → New Application
- Set Answer URL to: https://your-domain.com/call/incoming (POST)
- Assign your Plivo phone number to this application

For local development, use ngrok to expose your local server: ngrok http 8000

Quick Start — Managed Mode (Twilio)

The simplest mode. Twilio handles STT and TTS via ConversationRelay — you just provide text.

import asyncio
from definable.agent import Agent
from definable.agent.interface.call import CallInterface
from definable.agent.runtime import AgentRuntime

agent = Agent(
    model="openai/gpt-4o-mini",
    instructions="You are a helpful phone agent. Keep responses concise.",
)

call = CallInterface(
    agent=agent,
    provider="twilio",
    phone_number="+15551234567",
    pipeline="managed",
    welcome_message="Hello! How can I help you today?",
)

runtime = AgentRuntime(agent, interfaces=[call], host="0.0.0.0", port=8000)
asyncio.run(runtime.start())

Call your Twilio number and start talking.

Quick Start — Cascading Mode

Full control over STT and TTS providers. Works with both Twilio and Plivo.

import asyncio
from definable.agent import Agent
from definable.agent.interface.call import CallInterface
from definable.agent.interface.call.stt.deepgram import DeepgramSTT
from definable.agent.interface.call.tts.cartesia import CartesiaTTS
from definable.agent.runtime import AgentRuntime

stt = DeepgramSTT(model="nova-3", language="en-US", endpointing=300)
tts = CartesiaTTS(model="sonic-2", voice_id="a0e99841-438c-4a64-b679-ae501e7d6091")

agent = Agent(
    model="openai/gpt-4o-mini",
    instructions="You are a helpful phone agent. Keep responses concise.",
)

call = CallInterface(
    agent=agent,
    provider="twilio",  # or "plivo"
    phone_number="+15551234567",
    pipeline="cascading",
    stt=stt,
    tts=tts,
    welcome_message="Hello! How can I help you today?",
)

runtime = AgentRuntime(agent, interfaces=[call], host="0.0.0.0", port=8000)
asyncio.run(runtime.start())

Quick Start — Realtime Mode

Lowest latency using OpenAI’s speech-to-speech Realtime API. Audio flows directly to the model with no intermediate text step.

import asyncio
from definable.agent import Agent
from definable.agent.interface.call import CallInterface, OpenAIRealtimeProvider
from definable.agent.runtime import AgentRuntime

realtime = OpenAIRealtimeProvider(
    model="gpt-4o-realtime-preview",
    voice="alloy",
)

agent = Agent(
    model="openai/gpt-4o-mini",  # used for non-audio tasks
    instructions="You are a helpful phone agent.",
)

call = CallInterface(
    agent=agent,
    provider="twilio",
    phone_number="+15551234567",
    pipeline="realtime",
    realtime=realtime,
    welcome_message="Hello! How can I help?",
)

runtime = AgentRuntime(agent, interfaces=[call], host="0.0.0.0", port=8000)
asyncio.run(runtime.start())

CallInterface Parameters

Telephony

provider

str

default:"twilio"

Telephony provider: "twilio" or "plivo".

phone_number

str

required

Phone number to receive calls on (E.164 format, e.g. "+15551234567").

account_sid

str

Twilio Account SID. Falls back to TWILIO_ACCOUNT_SID env var.

auth_token

str

Twilio or Plivo auth token. Falls back to TWILIO_AUTH_TOKEN or PLIVO_AUTH_TOKEN env var.

auth_id

str

Plivo Auth ID. Falls back to PLIVO_AUTH_ID env var.

Pipeline

pipeline

str

default:"managed"

Voice pipeline mode: "managed", "cascading", or "realtime".

stt

STTProvider

Speech-to-text provider for cascading mode. Required when pipeline="cascading".

tts

TTSProvider

Text-to-speech provider for cascading mode. Required when pipeline="cascading".

realtime

RealtimeProvider

Realtime provider for speech-to-speech mode. Required when pipeline="realtime".

Voice Settings

welcome_message

str

Greeting spoken when a call connects.

voice

str

default:"en-US-Standard-A"

Voice name or ID for TTS synthesis.

language

str

default:"en-US"

BCP-47 language code.

interruptible

str

default:"any"

When the caller can interrupt: "none", "dtmf", "speech", or "any".

interrupt_sensitivity

str

default:"medium"

Barge-in sensitivity: "low", "medium", or "high".

Managed Mode Settings

stt_provider

str

default:"deepgram"

STT provider name for managed mode (Twilio ConversationRelay).

tts_provider

str

default:"google"

TTS provider name for managed mode (Twilio ConversationRelay).

Server Paths

webhook_path

str

default:"/call/incoming"

URL path for the incoming call webhook.

stream_path

str

default:"/call/stream"

URL path for WebSocket audio streams.

Call Settings

max_call_duration_seconds

int

default:"3600"

Maximum call duration before automatic hangup (1 hour default).

Pipeline Modes

Managed (Twilio Only)

The telephony provider handles STT and TTS natively. Your agent only sees text.

Caller speaks → Twilio STT → text → Agent → text → Twilio TTS → Caller hears

Simplest to set up — no STT/TTS provider configuration needed
~500ms latency
Limited to Twilio (uses ConversationRelay)
Provider-dependent voice/model selection

Plivo does not support managed mode — it has no ConversationRelay equivalent. Use pipeline="cascading" or pipeline="realtime" with Plivo.

Cascading

Raw audio flows through your own STT and TTS providers. Full control over every component.

Caller speaks → Twilio/Plivo → raw audio → STT → text
→ Agent → text → TTS → audio → Twilio/Plivo → Caller hears

Works with both Twilio and Plivo
Pluggable STT (DeepgramSTT) and TTS (CartesiaTTS)
Automatic barge-in detection (speech during playback)
~800-1200ms latency

Realtime (OpenAI)

Audio flows directly to OpenAI’s Realtime API for speech-to-speech processing. No intermediate text conversion step.

Caller speaks → Twilio/Plivo → raw audio → OpenAI Realtime API → audio → Caller hears

Lowest latency (~200-300ms)
Native function calling (tools work without text intermediary)
Server-side VAD (voice activity detection)
Locked to OpenAI Realtime models (gpt-4o-realtime-preview)

Telephony Providers

Twilio

Supports all three pipeline modes. Uses Media Streams for cascading/realtime and ConversationRelay for managed mode.

call = CallInterface(
    provider="twilio",
    account_sid="AC...",    # or TWILIO_ACCOUNT_SID env var
    auth_token="...",       # or TWILIO_AUTH_TOKEN env var
    phone_number="+15551234567",
    pipeline="managed",     # or "cascading" or "realtime"
)

Plivo

Supports cascading and realtime modes only. Uses bidirectional Audio Streaming over WebSocket.

call = CallInterface(
    provider="plivo",
    auth_id="MA...",        # or PLIVO_AUTH_ID env var
    auth_token="...",       # or PLIVO_AUTH_TOKEN env var
    phone_number="+15551234567",
    pipeline="cascading",   # or "realtime" (NOT "managed")
    stt=DeepgramSTT(...),
    tts=CartesiaTTS(...),
)

Key differences from Twilio:

No managed mode (no ConversationRelay equivalent)
Supports 16kHz PCM natively (Twilio only supports 8kHz mu-law)
Uses HMAC-SHA256 V3 for webhook signatures (Twilio uses HMAC-SHA1)

STT Providers

DeepgramSTT

Real-time streaming transcription via Deepgram’s WebSocket API.

from definable.agent.interface.call.stt.deepgram import DeepgramSTT

stt = DeepgramSTT(
    api_key="...",            # or DEEPGRAM_API_KEY env var
    model="nova-3",           # Deepgram model
    language="en-US",
    endpointing=300,          # silence detection (ms)
    smart_format=True,        # format numbers, dates, etc.
    utterance_end_ms=1000,    # utterance boundary detection
)

TTS Providers

CartesiaTTS

Ultra-low latency streaming TTS via Cartesia’s WebSocket API (40-90ms TTFB).

from definable.agent.interface.call.tts.cartesia import CartesiaTTS

tts = CartesiaTTS(
    api_key="...",            # or CARTESIA_API_KEY env var
    model="sonic-2",
    voice_id="a0e99841-...",  # Cartesia voice ID
    language="en",
)

Agent with Tools

Give your phone agent capabilities:

from definable.agent import Agent
from definable.tool.decorator import tool

@tool
def check_order(order_id: str) -> str:
    """Look up an order by ID."""
    return f"Order {order_id}: Shipped, arriving March 15."

@tool
def transfer_call(department: str) -> str:
    """Transfer the caller to a department."""
    return f"Transferring to {department}. Please hold."

agent = Agent(
    model="openai/gpt-4o-mini",
    instructions=(
        "You are a customer service agent for Acme Corp. "
        "Keep responses concise — you're on a phone call."
    ),
    tools=[check_order, transfer_call],
)

Deployment

CallInterface integrates with AgentRuntime, which provides a shared FastAPI server for webhooks and WebSocket connections.

from definable.agent.runtime import AgentRuntime

runtime = AgentRuntime(
    agent,
    interfaces=[call],
    host="0.0.0.0",
    port=8000,
)
await runtime.start()

For production:

Use a reverse proxy (nginx, Caddy) with TLS termination
Point your telephony provider’s webhook to https://your-domain.com/call/incoming
The WebSocket endpoint is at wss://your-domain.com/call/stream
Set max_call_duration_seconds to prevent runaway calls

Documentation Index

​Architecture

​Setup

​Install

​Twilio Setup

​Plivo Setup

​Quick Start — Managed Mode (Twilio)

​Quick Start — Cascading Mode

​Quick Start — Realtime Mode

​CallInterface Parameters

​Telephony

​Pipeline

​Voice Settings

​Managed Mode Settings

​Server Paths

​Call Settings

​Pipeline Modes

​Managed (Twilio Only)

​Cascading

​Realtime (OpenAI)

​Telephony Providers

​Twilio

​Plivo

​STT Providers

​DeepgramSTT

​TTS Providers

​CartesiaTTS

​Agent with Tools

​Deployment

Architecture

Setup

Install

Twilio Setup

Plivo Setup

Quick Start — Managed Mode (Twilio)

Quick Start — Cascading Mode

Quick Start — Realtime Mode

CallInterface Parameters

Telephony

Pipeline

Voice Settings

Managed Mode Settings

Server Paths

Call Settings

Pipeline Modes

Managed (Twilio Only)

Cascading

Realtime (OpenAI)

Telephony Providers

Twilio

Plivo

STT Providers

DeepgramSTT

TTS Providers

CartesiaTTS

Agent with Tools

Deployment