Deep Research - Definable AI

The deep research layer conducts automated multi-wave web research before the agent responds. It decomposes queries into sub-questions, searches the web, reads pages, compresses them into Compressed Knowledge Units (CKUs), accumulates knowledge with deduplication and contradiction detection, and synthesizes the results into context for the agent’s system prompt.

Quick Start

from definable.agent import Agent
from definable.model.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    instructions="You are a research assistant.",
    deep_research=True,
)

output = await agent.arun("Compare React and Vue frameworks in 2025.")
print(output.content)  # Response informed by live web research

With deep_research=True, the agent automatically:

Breaks the question into sub-questions
Searches the web for each sub-question
Reads and compresses relevant pages
Accumulates facts and detects contradictions
Injects the research context into the system prompt
Generates a response grounded in the research

How It Works

User Query
    │
    ▼
┌─────────────┐
│  Decompose  │ Break into sub-questions
└──────┬──────┘
       │
  ┌────▼────┐
  │ Wave N  │ ◄─── Repeat until coverage sufficient
  │         │
  │ Search  │ Parallel web searches
  │   ▼     │
  │  Read   │ Fetch + extract page content
  │   ▼     │
  │Compress │ Extract CKUs via cheap model
  │   ▼     │
  │Accumulate│ Knowledge graph + dedup + contradiction detection
  │   ▼     │
  │Gap Check│ Identify remaining knowledge gaps
  └────┬────┘
       │
  ┌────▼────┐
  │Synthesize│ Format context for system prompt
  └─────────┘

Configuration

Simple Enable

# Uses standard depth (3 waves, 15 sources, DuckDuckGo)
agent = Agent(model=model, deep_research=True)

Custom Configuration

from definable.agent.research import DeepResearchConfig

agent = Agent(
    model=model,
    deep_research=DeepResearchConfig(
        depth="deep",                       # 5 waves, 30 sources
        search_provider="duckduckgo",       # Free, no API key
        include_citations=True,
        include_contradictions=True,
        context_format="xml",
        max_context_tokens=4000,
    ),
)

Via DeepResearch Engine

You can also pass a pre-built DeepResearch engine instance directly:

from definable.agent.research import DeepResearch, DeepResearchConfig
from definable.agent.research.search import create_search_provider

researcher = DeepResearch(
    model=model,
    search_provider=create_search_provider("duckduckgo"),
    config=DeepResearchConfig(depth="deep"),
)
agent = Agent(model=model, deep_research=researcher)

Depth Presets

Preset	Waves	Max Sources	Parallel Searches	Best For
`"quick"`	1	8	3	Fast lookups, simple questions
`"standard"`	3	15	5	Balanced research (default)
`"deep"`	5	30	8	Thorough investigation, complex topics

# Quick — single wave, fast
agent = Agent(model=model, deep_research=DeepResearchConfig(depth="quick"))

# Deep — thorough multi-wave research
agent = Agent(model=model, deep_research=DeepResearchConfig(depth="deep"))

Search Providers

DuckDuckGo (Default)

Free, no API key required. Works out of the box.

agent = Agent(model=model, deep_research=True)  # Uses DuckDuckGo by default

Google Custom Search Engine

Requires a Google API key and Custom Search Engine ID.

from definable.agent.research import DeepResearchConfig

agent = Agent(
    model=model,
    deep_research=DeepResearchConfig(
        search_provider="google",
        search_provider_config={
            "api_key": "your-google-api-key",
            "cse_id": "your-cse-id",
        },
    ),
)

SerpAPI

Requires a SerpAPI key.

agent = Agent(
    model=model,
    deep_research=DeepResearchConfig(
        search_provider="serpapi",
        search_provider_config={"api_key": "your-serpapi-key"},
    ),
)

Custom Search Function

Provide any async callable that returns search results:

from definable.agent.research.search.base import SearchResult

async def my_search(query: str, max_results: int = 10) -> list[SearchResult]:
    # Your custom search logic
    return [SearchResult(url="...", title="...", snippet="...")]

agent = Agent(
    model=model,
    deep_research=DeepResearchConfig(search_fn=my_search),
)

Trigger Modes

Control when research runs:

Mode	Description
`"always"`	Run research on every `arun()` call (default)
`"auto"`	Model decides whether the query needs research
`"tool"`	Research only runs when explicitly invoked as a tool

agent = Agent(
    model=model,
    deep_research=DeepResearchConfig(trigger="auto"),
)

Standalone Usage

Use DeepResearch directly without an agent:

from definable.model.openai import OpenAIChat
from definable.agent.research import DeepResearch, DeepResearchConfig
from definable.agent.research.search import create_search_provider

model = OpenAIChat(id="gpt-4o-mini")
researcher = DeepResearch(
    model=model,
    search_provider=create_search_provider("duckduckgo"),
    config=DeepResearchConfig(depth="deep"),
)

result = await researcher.arun("What are the latest AI safety developments?")
print(result.context)         # Formatted context string
print(result.report)          # Standalone report
print(result.sources)         # List of SourceInfo
print(result.facts)           # Extracted facts
print(result.contradictions)  # Contradictions found
print(result.metrics)         # ResearchMetrics

Events

When streaming, the research pipeline emits progress events:

async for event in agent.arun_stream("Compare React and Vue"):
    match event.event:
        case "DeepResearchStarted":
            print(f"Research started: {event.query}")
        case "DeepResearchProgress":
            print(f"Wave {event.wave}: {event.sources_read} sources, "
                  f"{event.facts_extracted} facts, {event.gaps_remaining} gaps")
        case "DeepResearchCompleted":
            print(f"Done: {event.sources_used} sources, "
                  f"{event.facts_extracted} facts in {event.duration_ms:.0f}ms")
        case "RunContent":
            print(event.content, end="", flush=True)

Event	`event.event` value	Key Fields
`DeepResearchStartedEvent`	`"DeepResearchStarted"`	`query`, `depth`
`DeepResearchProgressEvent`	`"DeepResearchProgress"`	`wave`, `sources_read`, `facts_extracted`, `gaps_remaining`, `message`
`DeepResearchCompletedEvent`	`"DeepResearchCompleted"`	`sources_used`, `facts_extracted`, `waves_executed`, `duration_ms`, `contradictions_found`

Output Types

ResearchResult

Field	Type	Description
`context`	`str`	Formatted context for system prompt
`report`	`str`	Standalone research report
`sources`	`List[SourceInfo]`	Sources consulted
`facts`	`List[Fact]`	Extracted facts
`gaps`	`List[TopicGap]`	Remaining knowledge gaps
`contradictions`	`List[Contradiction]`	Contradictions between sources
`sub_questions`	`List[str]`	Decomposed sub-questions
`metrics`	`ResearchMetrics`	Performance metrics

Configuration Reference

depth

str

default:"standard"

Research depth preset: "quick", "standard", or "deep".

search_provider

str

default:"duckduckgo"

Search backend: "duckduckgo", "google", or "serpapi".

search_provider_config

Dict[str, Any]

Backend-specific config (API keys, CSE ID, etc.).

search_fn

Callable

Custom async search callable. Overrides search_provider.

compression_model

Model

Model for CKU extraction. Defaults to the agent’s model.

max_sources

int

default:"15"

Maximum unique sources across all waves.

max_waves

int

default:"3"

Maximum number of research waves.

parallel_searches

int

default:"5"

Concurrent search queries per wave.

parallel_reads

int

default:"10"

Concurrent page reads.

min_relevance

float

default:"0.3"

Minimum relevance score for CKU inclusion.

include_citations

bool

default:"true"

Include source citations in research context.

include_contradictions

bool

default:"true"

Surface contradictions between sources.

context_format

str

default:"xml"

Format for injected context: "xml" or "markdown".

max_context_tokens

int

default:"4000"

Approximate token budget for the context block.

early_termination_threshold

float

default:"0.2"

Stop when novelty ratio drops below this between waves.

trigger

str

default:"always"

When to run: "always", "auto", or "tool".

description

str

Description shown in the layer guide injected into the system prompt. If None, uses the default description.

Installation

Deep research requires the research extra:

pip install 'definable[research]'

This installs duckduckgo-search and curl-cffi for TLS-impersonated web reading.

​Quick Start

​How It Works

​Configuration

​Simple Enable

​Custom Configuration

​Via DeepResearch Engine

​Depth Presets

​Search Providers

​DuckDuckGo (Default)

​Google Custom Search Engine

​SerpAPI

​Custom Search Function

​Trigger Modes

​Standalone Usage

​Events

​Output Types

​ResearchResult

​Configuration Reference

​Installation