The deep research layer conducts automated multi-wave web research before the agent responds. It decomposes queries into sub-questions, searches the web, reads pages, compresses them into Compressed Knowledge Units (CKUs), accumulates knowledge with deduplication and contradiction detection, and synthesizes the results into context for the agent’s system prompt.
Quick Start
from definable.agent import Agent
from definable.model.openai import OpenAIChat
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
instructions="You are a research assistant.",
deep_research=True,
)
output = await agent.arun("Compare React and Vue frameworks in 2025.")
print(output.content) # Response informed by live web research
With deep_research=True, the agent automatically:
- Breaks the question into sub-questions
- Searches the web for each sub-question
- Reads and compresses relevant pages
- Accumulates facts and detects contradictions
- Injects the research context into the system prompt
- Generates a response grounded in the research
How It Works
User Query
│
▼
┌─────────────┐
│ Decompose │ Break into sub-questions
└──────┬──────┘
│
┌────▼────┐
│ Wave N │ ◄─── Repeat until coverage sufficient
│ │
│ Search │ Parallel web searches
│ ▼ │
│ Read │ Fetch + extract page content
│ ▼ │
│Compress │ Extract CKUs via cheap model
│ ▼ │
│Accumulate│ Knowledge graph + dedup + contradiction detection
│ ▼ │
│Gap Check│ Identify remaining knowledge gaps
└────┬────┘
│
┌────▼────┐
│Synthesize│ Format context for system prompt
└─────────┘
Configuration
Simple Enable
# Uses standard depth (3 waves, 15 sources, DuckDuckGo)
agent = Agent(model=model, deep_research=True)
Custom Configuration
from definable.agent.research import DeepResearchConfig
agent = Agent(
model=model,
deep_research=DeepResearchConfig(
depth="deep", # 5 waves, 30 sources
search_provider="duckduckgo", # Free, no API key
include_citations=True,
include_contradictions=True,
context_format="xml",
max_context_tokens=4000,
),
)
Via DeepResearch Engine
You can also pass a pre-built DeepResearch engine instance directly:
from definable.agent.research import DeepResearch, DeepResearchConfig
from definable.agent.research.search import create_search_provider
researcher = DeepResearch(
model=model,
search_provider=create_search_provider("duckduckgo"),
config=DeepResearchConfig(depth="deep"),
)
agent = Agent(model=model, deep_research=researcher)
Depth Presets
| Preset | Waves | Max Sources | Parallel Searches | Best For |
|---|
"quick" | 1 | 8 | 3 | Fast lookups, simple questions |
"standard" | 3 | 15 | 5 | Balanced research (default) |
"deep" | 5 | 30 | 8 | Thorough investigation, complex topics |
# Quick — single wave, fast
agent = Agent(model=model, deep_research=DeepResearchConfig(depth="quick"))
# Deep — thorough multi-wave research
agent = Agent(model=model, deep_research=DeepResearchConfig(depth="deep"))
Search Providers
DuckDuckGo (Default)
Free, no API key required. Works out of the box.
agent = Agent(model=model, deep_research=True) # Uses DuckDuckGo by default
Google Custom Search Engine
Requires a Google API key and Custom Search Engine ID.
from definable.agent.research import DeepResearchConfig
agent = Agent(
model=model,
deep_research=DeepResearchConfig(
search_provider="google",
search_provider_config={
"api_key": "your-google-api-key",
"cse_id": "your-cse-id",
},
),
)
SerpAPI
Requires a SerpAPI key.
agent = Agent(
model=model,
deep_research=DeepResearchConfig(
search_provider="serpapi",
search_provider_config={"api_key": "your-serpapi-key"},
),
)
Custom Search Function
Provide any async callable that returns search results:
from definable.agent.research.search.base import SearchResult
async def my_search(query: str, max_results: int = 10) -> list[SearchResult]:
# Your custom search logic
return [SearchResult(url="...", title="...", snippet="...")]
agent = Agent(
model=model,
deep_research=DeepResearchConfig(search_fn=my_search),
)
Trigger Modes
Control when research runs:
| Mode | Description |
|---|
"always" | Run research on every arun() call (default) |
"auto" | Model decides whether the query needs research |
"tool" | Research only runs when explicitly invoked as a tool |
agent = Agent(
model=model,
deep_research=DeepResearchConfig(trigger="auto"),
)
Standalone Usage
Use DeepResearch directly without an agent:
from definable.model.openai import OpenAIChat
from definable.agent.research import DeepResearch, DeepResearchConfig
from definable.agent.research.search import create_search_provider
model = OpenAIChat(id="gpt-4o-mini")
researcher = DeepResearch(
model=model,
search_provider=create_search_provider("duckduckgo"),
config=DeepResearchConfig(depth="deep"),
)
result = await researcher.arun("What are the latest AI safety developments?")
print(result.context) # Formatted context string
print(result.report) # Standalone report
print(result.sources) # List of SourceInfo
print(result.facts) # Extracted facts
print(result.contradictions) # Contradictions found
print(result.metrics) # ResearchMetrics
Events
When streaming, the research pipeline emits progress events:
async for event in agent.arun_stream("Compare React and Vue"):
match event.event:
case "DeepResearchStarted":
print(f"Research started: {event.query}")
case "DeepResearchProgress":
print(f"Wave {event.wave}: {event.sources_read} sources, "
f"{event.facts_extracted} facts, {event.gaps_remaining} gaps")
case "DeepResearchCompleted":
print(f"Done: {event.sources_used} sources, "
f"{event.facts_extracted} facts in {event.duration_ms:.0f}ms")
case "RunContent":
print(event.content, end="", flush=True)
| Event | event.event value | Key Fields |
|---|
DeepResearchStartedEvent | "DeepResearchStarted" | query, depth |
DeepResearchProgressEvent | "DeepResearchProgress" | wave, sources_read, facts_extracted, gaps_remaining, message |
DeepResearchCompletedEvent | "DeepResearchCompleted" | sources_used, facts_extracted, waves_executed, duration_ms, contradictions_found |
Output Types
ResearchResult
| Field | Type | Description |
|---|
context | str | Formatted context for system prompt |
report | str | Standalone research report |
sources | List[SourceInfo] | Sources consulted |
facts | List[Fact] | Extracted facts |
gaps | List[TopicGap] | Remaining knowledge gaps |
contradictions | List[Contradiction] | Contradictions between sources |
sub_questions | List[str] | Decomposed sub-questions |
metrics | ResearchMetrics | Performance metrics |
Configuration Reference
Research depth preset: "quick", "standard", or "deep".
Search backend: "duckduckgo", "google", or "serpapi".
Backend-specific config (API keys, CSE ID, etc.).
Custom async search callable. Overrides search_provider.
Model for CKU extraction. Defaults to the agent’s model.
Maximum unique sources across all waves.
Maximum number of research waves.
Concurrent search queries per wave.
Minimum relevance score for CKU inclusion.
Include source citations in research context.
Surface contradictions between sources.
Format for injected context: "xml" or "markdown".
Approximate token budget for the context block.
early_termination_threshold
Stop when novelty ratio drops below this between waves.
When to run: "always", "auto", or "tool".
Description shown in the layer guide injected into the system prompt. If None, uses the default description.
Installation
Deep research requires the research extra:
pip install 'definable[research]'
This installs duckduckgo-search and curl-cffi for TLS-impersonated web reading.