> ## Documentation Index
> Fetch the complete documentation index at: https://docs.definable.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cost Tracking

> Track token usage and costs across models, agents, and sessions.

Definable automatically tracks token usage and calculates costs for every model call. This data flows through the entire stack — from individual model invocations to aggregated agent runs.

## Per-Call Metrics

Every model call returns usage metrics:

```python theme={null}
from definable.model import OpenAIChat
from definable.model.message import Message

model = OpenAIChat(id="gpt-4o")
response = model.invoke(
    messages=[Message(role="user", content="Hello!")],
    assistant_message=Message(role="assistant", content=""),
)

metrics = response.response_usage
print(f"Input:  {metrics.input_tokens} tokens")
print(f"Output: {metrics.output_tokens} tokens")
print(f"Total:  {metrics.total_tokens} tokens")
print(f"Cost:   ${metrics.cost:.6f}")
```

## Agent Run Metrics

Agent runs aggregate metrics across all model calls in the run (including tool execution loops):

```python theme={null}
from definable.agent import Agent

agent = Agent(model=model, tools=[my_tool])
output = agent.run("Analyze this data and create a summary.")

print(f"Total tokens: {output.metrics.total_tokens}")
print(f"Total cost:   ${output.metrics.cost:.4f}")
print(f"Duration:     {output.metrics.duration:.2f}s")
```

## UsageTracker (Recommended)

The simplest way to track costs across runs. Enable with `usage=True` on the Agent constructor:

```python theme={null}
from definable.agent import Agent

agent = Agent(model="openai/gpt-4o-mini", usage=True)

await agent.arun("What is 2+2?")
await agent.arun("What is the capital of France?")

tracker = agent.usage_tracker
print(tracker.session_total)   # Usage(350 tokens, $0.0012, 2 runs)
print(tracker.last_run)        # Most recent run only
print(tracker.run_count)       # 2
```

The `UsageSnapshot` provides:

| Property         | Type    | Description                     |
| ---------------- | ------- | ------------------------------- |
| `input_tokens`   | `int`   | Total input tokens              |
| `output_tokens`  | `int`   | Total output tokens             |
| `total_tokens`   | `int`   | Combined tokens                 |
| `estimated_cost` | `float` | Estimated cost in USD           |
| `runs`           | `int`   | Number of runs in this snapshot |

Snapshots support addition (`a + b`) and serialization (`to_dict()`).

## Tracking Across Multiple Runs (Manual)

For more control, use `Metrics` addition to aggregate costs across a session or batch:

```python theme={null}
from definable.model.metrics import Metrics

session_metrics = Metrics()

for question in customer_questions:
    output = agent.run(question)
    session_metrics = session_metrics + output.metrics

print(f"Session total:")
print(f"  Tokens: {session_metrics.total_tokens}")
print(f"  Cost:   ${session_metrics.cost:.4f}")
```

Or use Python's `sum()`:

```python theme={null}
all_metrics = [agent.run(q).metrics for q in questions]
total = sum(all_metrics)
print(f"Batch cost: ${total.cost:.4f}")
```

## MetricsMiddleware

The `MetricsMiddleware` tracks aggregate stats across all runs for an agent:

```python theme={null}
from definable.agent import Agent, MetricsMiddleware

metrics_mw = MetricsMiddleware()
agent = Agent(model=model).use(metrics_mw)

# Run the agent multiple times
for q in questions:
    agent.run(q)

print(f"Total runs:       {metrics_mw.run_count}")
print(f"Error count:      {metrics_mw.error_count}")
print(f"Avg latency (ms): {metrics_mw.average_latency_ms:.0f}")
```

## Cost Breakdown

The `Metrics` class tracks all cost dimensions:

| Field                 | Description                       |
| --------------------- | --------------------------------- |
| `input_tokens`        | Tokens in the prompt              |
| `output_tokens`       | Tokens generated                  |
| `cache_read_tokens`   | Tokens served from provider cache |
| `cache_write_tokens`  | Tokens written to provider cache  |
| `reasoning_tokens`    | Tokens used for chain-of-thought  |
| `audio_input_tokens`  | Audio input tokens                |
| `audio_output_tokens` | Audio output tokens               |
| `cost`                | Total estimated cost in USD       |

## Pricing Registry

Definable includes a built-in pricing registry (`model_pricing.json`) with rates for all supported models. Cost is calculated automatically based on the model and token counts.

Prices are defined per million tokens:

```json theme={null}
{
  "openai": {
    "gpt-4o": {
      "input_per_million": 2.50,
      "output_per_million": 10.00,
      "cached_input_per_million": 1.25
    }
  }
}
```

## Serializing Metrics

Export metrics for logging, dashboards, or billing systems:

```python theme={null}
metrics_dict = output.metrics.to_dict()
# {
#   'input_tokens': 150,
#   'output_tokens': 87,
#   'total_tokens': 237,
#   'cost': 0.001245,
#   'duration': 1.83
# }
```

Zero values and `None` fields are excluded automatically for clean output.

## Cost Budgets

Implement a simple cost guard using middleware:

```python theme={null}
class CostBudgetMiddleware:
    def __init__(self, max_cost: float):
        self.max_cost = max_cost
        self.total_cost = 0.0

    async def __call__(self, context, next_handler):
        result = await next_handler(context)

        if result.metrics and result.metrics.cost:
            self.total_cost += result.metrics.cost
            if self.total_cost > self.max_cost:
                raise Exception(
                    f"Cost budget exceeded: ${self.total_cost:.4f} > ${self.max_cost:.4f}"
                )

        return result

# Limit spending to $1.00
agent.use(CostBudgetMiddleware(max_cost=1.00))
```
