Quick Start
thinking=True, the agent makes two model calls:
- Thinking call — analyzes the request and produces a compact plan
- Main call — generates the response guided by the plan
How It Works
When thinking is enabled, the agent:- Builds a context-aware thinking prompt that includes:
- A summary of the agent’s instructions (first 500 characters)
- A catalog of available tools (name + one-line description)
- Flags for whether knowledge base or memory context is available
- Calls the model with this prompt and a compact structured output schema (
ThinkingOutput) - Injects a brief
<analysis>tag (~20-50 tokens) into the system prompt before knowledge and memory context - Runs the main model call with the plan guiding how it uses available context
System Prompt Order
Configuration
Default (Recommended)
Custom Model
Use a separate (potentially cheaper or faster) model for the thinking phase:Custom Instructions
Override the context-aware prompt with a fully custom thinking prompt:Disable
Thinking Reference
Whether thinking is active. Always
True when instantiated directly.Model to use for the thinking phase. If
None, uses the agent’s model.Custom thinking prompt. If
None, uses the context-aware prompt builder which includes tool catalog, agent instructions summary, and context availability flags.When to activate thinking.
"always" runs every call; "auto" does a lightweight model pre-check and only activates thinking when the query is judged complex; "never" disables even if configured.Description shown in the layer guide injected into the system prompt. If
None, uses the default description for the thinking layer.Output
The thinking phase populates three fields onRunOutput:
| Field | Type | Description |
|---|---|---|
reasoning_steps | List[ReasoningStep] | Structured reasoning steps (mapped from the thinking output) |
reasoning_content | str | XML-formatted reasoning for observability/debugging |
reasoning_messages | List[Message] | The full thinking conversation (system prompt + model response) |
Streaming
The thinking phase streams in real time. Events are emitted in this order:Context-Aware Thinking vs Model-Native Reasoning
Definable supports two types of reasoning that can coexist:| Agent Thinking Layer | Model-Native Reasoning | |
|---|---|---|
| Trigger | Agent(thinking=True) | Model capability (e.g., DeepSeek Reasoner, OpenAI o1) |
| Control | Full — custom prompt, model, instructions | None — model decides internally |
| Tool awareness | Yes — sees tool catalog | No — reasons without tool knowledge |
| Output fields | reasoning_steps, reasoning_messages | reasoning_content (via ModelResponse) |
| Cost | Extra model call | Built into model pricing |
| Best for | Complex tool-using agents | Math/logic reasoning tasks |
reasoning_steps and reasoning_messages, while model-native reasoning populates reasoning_content.
Testing
UseMockModel with structured_responses to test the thinking phase without API calls: