Skip to main content
Streaming displays tokens as they are generated instead of waiting for the full response. This dramatically improves perceived latency.

Agent Streaming

for event in agent.run_stream("Tell me a story."):
    if event.event == "RunContent" and event.content:
        print(event.content, end="", flush=True)

Event Types

EventDescription
RunStartedAgent execution began
RunContentA chunk of the agent’s text response
RunContentCompletedContent generation done
ToolCallStartedA tool call is about to execute
ToolCallCompletedA tool call finished
ToolCallErrorA tool call failed
ReasoningStartedThinking phase began
ReasoningStepA reasoning step
RunCompletedEntire run finished (includes final RunOutput)
RunErrorRun failed

Full Event Handling

for event in agent.run_stream("Research quantum computing."):
    match event.event:
        case "RunContent":
            print(event.content, end="", flush=True)
        case "ToolCallStarted":
            print(f"\n> Calling {event.tool.tool_name}...")
        case "ToolCallCompleted":
            print(f"  Done: {event.content[:80]}")
        case "RunCompleted":
            print(f"\n\nTokens: {event.metrics.total_tokens}")

Model Streaming

Stream directly from a model:
from definable.model.openai import OpenAIChat
from definable.model.message import Message

model = OpenAIChat(id="gpt-4o")

for chunk in model.invoke_stream(
    messages=[Message(role="user", content="Explain DNS.")],
    assistant_message=Message(role="assistant", content=""),
):
    if chunk.content:
        print(chunk.content, end="", flush=True)
The RunCompleted event in agent streaming contains the full RunOutput object in event.output, giving you access to aggregated metrics and the complete message history.