Skip to main content
Guardrails let you enforce content policies on every agent run — blocking dangerous input, redacting PII from output, and restricting which tools the model can call.

Quick Example

from definable.agent import Agent
from definable.agent.guardrail import Guardrails, max_tokens, pii_filter, tool_blocklist
from definable.model.openai import OpenAIChat

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    guardrails=Guardrails(
        input=[max_tokens(500)],
        output=[pii_filter()],
        tool=[tool_blocklist({"delete_all"})],
    ),
)

output = agent.run("What's my account balance?")

How It Works

Three checkpoints run automatically on every arun() / arun_stream() call:
  1. Input — after memory recall, before the model call. Can block or modify the user message.
  2. Tool — inside the tool call loop, before each tool execution. Blocked tools send an error result back to the model.
  3. Output — after the model response, before memory store. Can block, modify, or redact the response.

Guardrails Constructor

from definable.agent.guardrail import Guardrails

guardrails = Guardrails(
    input=[...],
    output=[...],
    tool=[...],
    mode="fail_fast",
    on_block="raise",
)
input
List[InputGuardrail]
default:"[]"
Guardrails that check the user message before the LLM call.
output
List[OutputGuardrail]
default:"[]"
Guardrails that check the model response after the LLM call.
tool
List[ToolGuardrail]
default:"[]"
Guardrails that check each tool call before execution.
mode
str
default:"fail_fast"
"fail_fast" stops at the first block. "run_all" runs every guardrail and collects all results.
on_block
str
default:"raise"
"raise" throws InputCheckError or OutputCheckError. "return_message" returns a RunOutput with status=RunStatus.blocked.

Built-in Guardrails

Input Guardrails

from definable.agent.guardrail import max_tokens, block_topics, regex_filter
max_tokens(n, model_id='gpt-4o')
InputGuardrail
Blocks input that exceeds n tokens. Uses the specified model’s tokenizer for counting.
block_topics(topics)
InputGuardrail
Blocks input containing any keyword from the topics list (case-insensitive substring match).
regex_filter(patterns, action='block')
InputGuardrail
Blocks or redacts input matching any of the given regex patterns. Set action="modify" to redact matches instead of blocking.

Output Guardrails

from definable.agent.guardrail import pii_filter, max_output_tokens
pii_filter(action='modify')
OutputGuardrail
Detects PII (credit cards, SSN, email, phone) and redacts it with tokens like [CREDIT_CARD], [SSN], [EMAIL], [PHONE]. Set action="block" to block the entire response instead.
max_output_tokens(n, model_id='gpt-4o')
OutputGuardrail
Blocks output that exceeds n tokens.

Tool Guardrails

from definable.agent.guardrail import tool_allowlist, tool_blocklist
tool_allowlist(allowed)
ToolGuardrail
Only allows tools whose names appear in the allowed set. All others are blocked.
tool_blocklist(blocked)
ToolGuardrail
Blocks tools whose names appear in the blocked set. All others are allowed.

Custom Guardrails

Using Decorators

The fastest way to create a custom guardrail:
from definable.agent.guardrail import input_guardrail, GuardrailResult

@input_guardrail
async def no_profanity(text: str, context) -> GuardrailResult:
    banned = ["badword", "offensive"]
    if any(word in text.lower() for word in banned):
        return GuardrailResult.block("Profanity detected")
    return GuardrailResult.allow()

@input_guardrail(name="length_check")
async def check_length(text: str, context) -> GuardrailResult:
    if len(text) > 10000:
        return GuardrailResult.block("Input too long")
    return GuardrailResult.allow()
Also available: @output_guardrail and @tool_guardrail.

Class-Based

Implement the protocol directly for more control:
from definable.agent.guardrail import GuardrailResult

class SentimentGuardrail:
    name = "sentiment_check"

    async def check(self, text: str, context) -> GuardrailResult:
        # Your custom logic here
        if is_toxic(text):
            return GuardrailResult.block("Toxic content detected")
        return GuardrailResult.allow()

Modify Action

Guardrails can rewrite content instead of blocking:
@output_guardrail
async def redact_names(text: str, context) -> GuardrailResult:
    cleaned = text.replace("Alice", "[REDACTED]")
    if cleaned != text:
        return GuardrailResult.modify(cleaned, reason="Names redacted")
    return GuardrailResult.allow()

Composable Guardrails

Combine guardrails with logic operators:
from definable.agent.guardrail import ALL, ANY, NOT, when, max_tokens, block_topics

# ALL — every guardrail must allow
strict_input = ALL(
    max_tokens(1000),
    block_topics(["violence", "exploit"]),
    name="strict_input",
)

# ANY — at least one must allow
flexible_check = ANY(
    max_tokens(5000),
    max_tokens(10000),
    name="flexible_check",
)

# NOT — invert a guardrail (allow ↔ block)
must_mention_topic = NOT(
    block_topics(["support"]),
    name="must_mention_support",
)

# when — conditional execution
admin_limit = when(
    condition=lambda ctx: ctx.user_id != "admin",
    guardrail=max_tokens(500),
    name="non_admin_limit",
)

Block Handling

Raise Exceptions (Default)

from definable.exceptions import InputCheckError, OutputCheckError

agent = Agent(
    model=model,
    guardrails=Guardrails(
        input=[max_tokens(100)],
        on_block="raise",  # default
    ),
)

try:
    output = agent.run("a]very long message...")
except InputCheckError as e:
    print(f"Blocked: {e.message}")
except OutputCheckError as e:
    print(f"Output blocked: {e.message}")

Return Blocked Status

from definable.agent.run import RunStatus

agent = Agent(
    model=model,
    guardrails=Guardrails(
        input=[max_tokens(100)],
        on_block="return_message",
    ),
)

output = agent.run("a very long message...")
if output.status == RunStatus.blocked:
    print(f"Request was blocked: {output.content}")

Tracing Events

Guardrail activity is captured in the agent’s trace stream:
EventFieldsEmitted When
GuardrailCheckedEventguardrail_name, guardrail_type, action, message, duration_msAfter each check completes
GuardrailBlockedEventguardrail_name, guardrail_type, reasonWhen a guardrail blocks

What’s Next

Agents Overview

Learn how agents orchestrate models, tools, and guardrails.

Middleware

Add request/response transforms alongside guardrails.

Testing

Use MockModel to test guardrail behavior without API calls.

Error Handling

Handle InputCheckError, OutputCheckError, and other exceptions.