Skip to main content
The thinking layer adds an explicit reasoning phase before the agent produces its final response. The agent first analyzes the user’s request, plans its approach, and identifies which tools to use — then executes with that plan guiding the response.

Quick Start

from definable.agent import Agent
from definable.model import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    instructions="You are a research assistant.",
    thinking=True,
)

output = agent.run("Compare microservices vs monolith architectures.")
print(output.content)
With thinking=True, the agent makes two model calls:
  1. Thinking call — analyzes the request and produces a compact plan
  2. Main call — generates the response guided by the plan

How It Works

When thinking is enabled, the agent:
  1. Builds a context-aware thinking prompt that includes:
    • A summary of the agent’s instructions (first 500 characters)
    • A catalog of available tools (name + one-line description)
    • Flags for whether knowledge base or memory context is available
  2. Calls the model with this prompt and a compact structured output schema (ThinkingOutput)
  3. Injects a brief <analysis> tag (~20-50 tokens) into the system prompt before knowledge and memory context
  4. Runs the main model call with the plan guiding how it uses available context
The thinking plan is positioned before knowledge and memory in the system prompt, so it frames how the model uses retrieval content rather than competing with it.

System Prompt Order

Agent instructions
  → Skill instructions
    → <analysis>Plan from thinking phase</analysis>
      → Knowledge context (RAG results)
        → Memory context (conversation history)

Configuration

agent = Agent(
    model=model,
    thinking=True,  # Uses agent's model, builds context-aware prompt
)

Custom Model

Use a separate (potentially cheaper or faster) model for the thinking phase:
from definable.agent.reasoning import Thinking
from definable.model import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    thinking=Thinking(
        model=OpenAIChat(id="gpt-4o-mini"),  # Cheaper model for thinking
    ),
)

Custom Instructions

Override the context-aware prompt with a fully custom thinking prompt:
agent = Agent(
    model=model,
    thinking=Thinking(
        instructions="Focus on identifying edge cases and potential failures in the user's approach.",
    ),
)
Custom instructions bypass the context-aware prompt building entirely. The thinking model will not receive tool names, agent instructions, or context availability flags.

Disable

agent = Agent(
    model=model,
    thinking=Thinking(enabled=False),  # Explicitly disabled
)

Thinking Reference

enabled
bool
default:"true"
Whether thinking is active. Always True when instantiated directly.
model
Model
Model to use for the thinking phase. If None, uses the agent’s model.
instructions
str
Custom thinking prompt. If None, uses the context-aware prompt builder which includes tool catalog, agent instructions summary, and context availability flags.
trigger
str
default:"always"
When to activate thinking. "always" runs every call; "auto" does a lightweight model pre-check and only activates thinking when the query is judged complex; "never" disables even if configured.
description
str
Description shown in the layer guide injected into the system prompt. If None, uses the default description for the thinking layer.

Output

The thinking phase populates three fields on RunOutput:
FieldTypeDescription
reasoning_stepsList[ReasoningStep]Structured reasoning steps (mapped from the thinking output)
reasoning_contentstrXML-formatted reasoning for observability/debugging
reasoning_messagesList[Message]The full thinking conversation (system prompt + model response)
output = agent.run("Plan a database migration.")

if output.reasoning_steps:
    for step in output.reasoning_steps:
        print(f"[{step.title}] {step.reasoning}")

# Access the raw thinking conversation
if output.reasoning_messages:
    print(output.reasoning_messages[-1].content)  # Model's structured response

Streaming

The thinking phase streams in real time. Events are emitted in this order:
ReasoningStartedEvent
  → ReasoningContentDeltaEvent (N chunks)
    → ReasoningStepEvent (1-2 steps)
      → ReasoningCompletedEvent
        → RunStartedEvent
          → RunContentEvent (main response)
            → RunCompletedEvent
async for event in agent.arun_stream("Complex question"):
    match event.event:
        case "ReasoningContentDelta":
            print(event.reasoning_content, end="")  # Stream thinking tokens
        case "ReasoningStep":
            print(f"\n[step] {event.reasoning_content}")
        case "ReasoningCompleted":
            print("\n--- Thinking complete ---\n")
        case "RunContent":
            print(event.content, end="")  # Stream main response

Context-Aware Thinking vs Model-Native Reasoning

Definable supports two types of reasoning that can coexist:
Agent Thinking LayerModel-Native Reasoning
TriggerAgent(thinking=True)Model capability (e.g., DeepSeek Reasoner, OpenAI o1)
ControlFull — custom prompt, model, instructionsNone — model decides internally
Tool awarenessYes — sees tool catalogNo — reasons without tool knowledge
Output fieldsreasoning_steps, reasoning_messagesreasoning_content (via ModelResponse)
CostExtra model callBuilt into model pricing
Best forComplex tool-using agentsMath/logic reasoning tasks
Both can be active simultaneously. The thinking layer populates reasoning_steps and reasoning_messages, while model-native reasoning populates reasoning_content.

Testing

Use MockModel with structured_responses to test the thinking phase without API calls:
import json
from definable.agent import Agent
from definable.agent.testing import MockModel
from definable.agent.tracing import Tracing

thinking_json = json.dumps({
    "analysis": "The user wants to compare two architectures.",
    "approach": "Outline tradeoffs for scalability, complexity, and team structure.",
    "tool_plan": None,
})

model = MockModel(
    responses=["Here is the comparison..."],       # Main response
    structured_responses=[thinking_json],           # Thinking response
)

agent = Agent(
    model=model,
    thinking=True,
    tracing=Tracing(enabled=False),
)

output = agent.run("Compare microservices vs monoliths")
assert model.call_count == 2  # Thinking + main
assert output.reasoning_steps is not None
When tool_plan is provided in the thinking output, the reasoning steps include a “Tool Plan” step listing the planned tool sequence. This is useful for testing that the thinking phase correctly identifies which tools to use.