Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.definable.ai/llms.txt

Use this file to discover all available pages before exploring further.

Definable automatically tracks token usage and calculates costs for every model call. This data flows through the entire stack — from individual model invocations to aggregated agent runs.

Per-Call Metrics

Every model call returns usage metrics:
from definable.model import OpenAIChat
from definable.model.message import Message

model = OpenAIChat(id="gpt-4o")
response = model.invoke(
    messages=[Message(role="user", content="Hello!")],
    assistant_message=Message(role="assistant", content=""),
)

metrics = response.response_usage
print(f"Input:  {metrics.input_tokens} tokens")
print(f"Output: {metrics.output_tokens} tokens")
print(f"Total:  {metrics.total_tokens} tokens")
print(f"Cost:   ${metrics.cost:.6f}")

Agent Run Metrics

Agent runs aggregate metrics across all model calls in the run (including tool execution loops):
from definable.agent import Agent

agent = Agent(model=model, tools=[my_tool])
output = agent.run("Analyze this data and create a summary.")

print(f"Total tokens: {output.metrics.total_tokens}")
print(f"Total cost:   ${output.metrics.cost:.4f}")
print(f"Duration:     {output.metrics.duration:.2f}s")
The simplest way to track costs across runs. Enable with usage=True on the Agent constructor:
from definable.agent import Agent

agent = Agent(model="openai/gpt-4o-mini", usage=True)

await agent.arun("What is 2+2?")
await agent.arun("What is the capital of France?")

tracker = agent.usage_tracker
print(tracker.session_total)   # Usage(350 tokens, $0.0012, 2 runs)
print(tracker.last_run)        # Most recent run only
print(tracker.run_count)       # 2
The UsageSnapshot provides:
PropertyTypeDescription
input_tokensintTotal input tokens
output_tokensintTotal output tokens
total_tokensintCombined tokens
estimated_costfloatEstimated cost in USD
runsintNumber of runs in this snapshot
Snapshots support addition (a + b) and serialization (to_dict()).

Tracking Across Multiple Runs (Manual)

For more control, use Metrics addition to aggregate costs across a session or batch:
from definable.model.metrics import Metrics

session_metrics = Metrics()

for question in customer_questions:
    output = agent.run(question)
    session_metrics = session_metrics + output.metrics

print(f"Session total:")
print(f"  Tokens: {session_metrics.total_tokens}")
print(f"  Cost:   ${session_metrics.cost:.4f}")
Or use Python’s sum():
all_metrics = [agent.run(q).metrics for q in questions]
total = sum(all_metrics)
print(f"Batch cost: ${total.cost:.4f}")

MetricsMiddleware

The MetricsMiddleware tracks aggregate stats across all runs for an agent:
from definable.agent import Agent, MetricsMiddleware

metrics_mw = MetricsMiddleware()
agent = Agent(model=model).use(metrics_mw)

# Run the agent multiple times
for q in questions:
    agent.run(q)

print(f"Total runs:       {metrics_mw.run_count}")
print(f"Error count:      {metrics_mw.error_count}")
print(f"Avg latency (ms): {metrics_mw.average_latency_ms:.0f}")

Cost Breakdown

The Metrics class tracks all cost dimensions:
FieldDescription
input_tokensTokens in the prompt
output_tokensTokens generated
cache_read_tokensTokens served from provider cache
cache_write_tokensTokens written to provider cache
reasoning_tokensTokens used for chain-of-thought
audio_input_tokensAudio input tokens
audio_output_tokensAudio output tokens
costTotal estimated cost in USD

Pricing Registry

Definable includes a built-in pricing registry (model_pricing.json) with rates for all supported models. Cost is calculated automatically based on the model and token counts. Prices are defined per million tokens:
{
  "openai": {
    "gpt-4o": {
      "input_per_million": 2.50,
      "output_per_million": 10.00,
      "cached_input_per_million": 1.25
    }
  }
}

Serializing Metrics

Export metrics for logging, dashboards, or billing systems:
metrics_dict = output.metrics.to_dict()
# {
#   'input_tokens': 150,
#   'output_tokens': 87,
#   'total_tokens': 237,
#   'cost': 0.001245,
#   'duration': 1.83
# }
Zero values and None fields are excluded automatically for clean output.

Cost Budgets

Implement a simple cost guard using middleware:
class CostBudgetMiddleware:
    def __init__(self, max_cost: float):
        self.max_cost = max_cost
        self.total_cost = 0.0

    async def __call__(self, context, next_handler):
        result = await next_handler(context)

        if result.metrics and result.metrics.cost:
            self.total_cost += result.metrics.cost
            if self.total_cost > self.max_cost:
                raise Exception(
                    f"Cost budget exceeded: ${self.total_cost:.4f} > ${self.max_cost:.4f}"
                )

        return result

# Limit spending to $1.00
agent.use(CostBudgetMiddleware(max_cost=1.00))