Browser Toolkit - Definable AI

The BrowserToolkit gives agents full browser automation via Playwright CDP mode. It drives Chrome directly through the Chrome DevTools Protocol — native async, role-based element refs (e1, e2, e3), AI-friendly errors, and self-healing connections.

Installation

pip install 'definable[browser]'
playwright install chromium

Quick Start

from definable.agent import Agent
from definable.browser import BrowserToolkit, BrowserConfig

async def main():
  config = BrowserConfig(headless=False)
  async with BrowserToolkit(config=config) as toolkit:
    agent = Agent(model="openai/gpt-4o", toolkits=[toolkit])
    result = await agent.arun("Go to news.ycombinator.com and list the top 3 stories")
    print(result.content)

The toolkit exposes 55 tools that the agent can call to navigate, read, interact with, and extract data from web pages.

Connection Modes

BrowserToolkit supports three connection modes, selected by the BrowserConfig you pass in.

Fresh Chrome (Default)

Launches a new ephemeral Chrome instance. Cookies and storage are discarded when the toolkit shuts down.

config = BrowserConfig(headless=False)

Persistent Profile

Launch Chrome with a persistent user data directory. Cookies, localStorage, and logged-in sessions survive between runs.

config = BrowserConfig(user_data_dir="/tmp/my-profile")

CDP Attach

Attach to an already-running Chrome instance via its remote debugging port. No new browser window is opened — the agent controls your existing browser.

# First, launch Chrome with remote debugging enabled:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 --no-first-run

config = BrowserConfig(cdp_url="http://127.0.0.1:9222")

BrowserConfig Reference

cdp_url

str

WebSocket or HTTP URL of an existing Chrome CDP endpoint. When set, no new browser is launched.

headless

bool

default:"false"

Run Chrome without a visible window.

user_data_dir

str

Path to a Chrome user data directory for session persistence.

stealth

bool

default:"true"

Enable anti-detection flags (--disable-blink-features=AutomationControlled).

no_sandbox

bool

default:"false"

Disable Chrome sandbox. Required in Docker/CI environments.

proxy

str

Proxy server in "host:port" or "user:pass@host:port" format.

user_agent

str

Override the browser’s User-Agent string.

locale

str

default:"en-US"

Browser locale code, e.g. "en-US", "fr", "zh-CN".

timezone

str

Browser timezone override, e.g. "America/New_York".

viewport_width

int

default:"1280"

Browser viewport width in pixels.

viewport_height

int

default:"720"

Browser viewport height in pixels.

timeout

float

default:"30.0"

Default per-operation timeout in seconds.

executable_path

str

Path to Chrome/Brave/Edge binary. Auto-detected if not set.

extra_args

tuple[str, ...]

Additional Chrome CLI flags.

Element Refs — The Key Innovation

Agents interact with semantic element refs instead of brittle CSS selectors. After calling browser_snapshot(), every interactive element gets a ref like e1, e2, e3:

- heading "Login" [level=1]
- textbox "Email" [ref=e1]
- textbox "Password" [ref=e2]
- button "Sign In" [ref=e3]
[3 refs, 3 interactive]

Then the agent uses refs for all interactions:

browser_type("e1", "[email protected]")  → Typed into e1
browser_type("e2", "password123")       → Typed into e2
browser_click("e3")                     → Clicked: e3

CSS selectors still work everywhere — browser_click("button.submit") is also valid. The ref system auto-detects which you’re using.

Tools Reference

All 55 tools are grouped by category below. Every tool name is prefixed with browser_.

Tool	Description
`browser_navigate`	Navigate to a URL (must include scheme, e.g. `https://`)
`browser_go_back`	Navigate to the previous page in history
`browser_go_forward`	Navigate forward in history
`browser_refresh`	Reload the current page

Page State (7 tools)

Tool	Description
`browser_get_url`	Return the current page URL
`browser_get_title`	Return the current page title
`browser_get_text`	Return visible text of an element (default: `body`)
`browser_get_source`	Return page HTML source (capped at 20,000 chars)
`browser_get_attribute`	Return an HTML attribute value on an element
`browser_is_visible`	Check if an element is currently visible
`browser_get_page_info`	Situational snapshot: URL, title, scroll position, element counts

Perception (2 tools)

Tool	Description
`browser_snapshot`	Accessibility-tree view with role-based refs (e1, e2, e3) for every interactive element
`browser_screenshot`	Take a screenshot and save to a file

Interaction (15 tools)

Tool	Description
`browser_click`	Click an element by ref or CSS selector
`browser_click_if_visible`	Click only if the element is visible (safe for popups)
`browser_click_by_text`	Click the first element whose visible text contains the given string
`browser_type`	Clear an input and type text into it
`browser_type_slowly`	Type with human-like 75ms delays (avoids bot detection)
`browser_press_keys`	Send keystrokes to a specific element (requires selector)
`browser_press_key`	Press a key on the focused element (no selector needed)
`browser_clear_input`	Clear an input field or textarea
`browser_select_option`	Select an option from a `<select>` dropdown by visible text
`browser_hover`	Hover over an element (reveals dropdowns, tooltips)
`browser_drag`	Drag one element to another via Playwright native drag
`browser_fill_form`	Batch fill a form: `[{ref, type, value}, ...]`
`browser_set_value`	Set an element’s value directly (works for sliders, range inputs)
`browser_set_input_files`	Set files on a file input element
`browser_execute_js`	Execute JavaScript in the page context

Scrolling (3 tools)

Tool	Description
`browser_scroll_down`	Scroll down by N screen-heights (default 3)
`browser_scroll_up`	Scroll up by N screen-heights (default 3)
`browser_scroll_to`	Scroll until an element is in view

Waiting (4 tools)

Tool	Description
`browser_wait`	Pause for N seconds (use after page loads)
`browser_wait_for_element`	Wait up to N seconds for an element to appear
`browser_wait_for_text`	Wait up to N seconds for text to appear inside a selector
`browser_wait_for`	Unified wait: text, text_gone, selector, url, load_state, or JS function

DOM Manipulation (2 tools)

Tool	Description
`browser_highlight`	Flash a gold border around an element for 2 seconds
`browser_remove_elements`	Remove all elements matching a selector from the DOM

Forms & Checkboxes (3 tools)

Tool	Description
`browser_is_checked`	Check if a checkbox or radio is checked
`browser_check`	Check a checkbox or radio (idempotent)
`browser_uncheck`	Uncheck a checkbox (idempotent)

Cookies (3 tools)

Tool	Description
`browser_get_cookies`	Return all cookies as a JSON array
`browser_set_cookie`	Set a cookie on the current domain
`browser_clear_cookies`	Delete all cookies for the session

Storage (2 tools)

Tool	Description
`browser_get_storage`	Get a value from localStorage or sessionStorage
`browser_set_storage`	Set a key/value pair in localStorage or sessionStorage

Tabs (4 tools)

Tool	Description
`browser_open_tab`	Open a new tab, optionally navigating to a URL
`browser_close_tab`	Close the currently active tab
`browser_get_tabs`	Return the number of open tabs
`browser_switch_to_tab`	Switch to a tab by zero-based index

Output (1 tool)

Tool	Description
`browser_print_to_pdf`	Save the current page as PDF

Dialogs (1 tool)

Tool	Description
`browser_handle_dialog`	Accept or dismiss a browser dialog (alert/confirm/prompt)

Browser State (1 tool)

Tool	Description
`browser_set_geolocation`	Override GPS coordinates via CDP

Diagnostics (3 tools)

Tool	Description
`browser_get_console`	Return captured browser console messages (with level filter)
`browser_get_errors`	Return captured browser page errors
`browser_get_network`	Return captured network requests (with URL filter)

Usage with Agent

The toolkit follows the AsyncLifecycleToolkit protocol. Use async with for automatic startup and shutdown:

from definable.agent import Agent
from definable.browser import BrowserToolkit, BrowserConfig
from definable.model.openai import OpenAIChat

async def run():
  config = BrowserConfig(headless=False)
  async with BrowserToolkit(config=config) as toolkit:
    agent = Agent(
      model=OpenAIChat(id="gpt-4o"),
      toolkits=[toolkit],
      instructions="You are a web research assistant. Use browser_snapshot before interacting with any page.",
    )
    result = await agent.arun("Go to example.com and tell me the page title")
    print(result.content)

Use browser_snapshot before interacting with a page. It returns an accessibility-tree view with role-based refs (e1, e2, e3) for every interactive element, which is more useful than browser_get_source for understanding page structure.

Combining with Other Toolkits

BrowserToolkit can be used alongside other toolkits:

from definable.mcp import MCPToolkit, MCPConfig

async with BrowserToolkit(config=config) as browser:
  async with MCPToolkit(config=MCPConfig(...)) as mcp:
    agent = Agent(
      model="openai/gpt-4o",
      toolkits=[browser, mcp],
    )

Testing

Inject a mock browser to test without launching Chrome:

from unittest.mock import AsyncMock
from definable.browser import BrowserToolkit

mock_browser = AsyncMock()
mock_browser.navigate.return_value = "Navigated to https://example.com | Title: Example"

toolkit = BrowserToolkit(browser=mock_browser)
await toolkit.initialize()

assert len(toolkit.tools) == 55

Documentation Index

​Installation

​Quick Start

​Connection Modes

​Fresh Chrome (Default)

​Persistent Profile

​CDP Attach

​BrowserConfig Reference

​Element Refs — The Key Innovation

​Tools Reference

​Navigation (4 tools)

​Page State (7 tools)

​Perception (2 tools)

​Interaction (15 tools)

​Scrolling (3 tools)

​Waiting (4 tools)

​DOM Manipulation (2 tools)

​Forms & Checkboxes (3 tools)

​Cookies (3 tools)

​Storage (2 tools)

​Tabs (4 tools)

​Output (1 tool)

​Dialogs (1 tool)

​Browser State (1 tool)

​Diagnostics (3 tools)

​Usage with Agent

​Combining with Other Toolkits

​Testing

Installation

Quick Start

Connection Modes

Fresh Chrome (Default)

Persistent Profile

CDP Attach

BrowserConfig Reference

Element Refs — The Key Innovation

Tools Reference

Navigation (4 tools)

Page State (7 tools)

Perception (2 tools)

Interaction (15 tools)

Scrolling (3 tools)

Waiting (4 tools)

DOM Manipulation (2 tools)

Forms & Checkboxes (3 tools)

Cookies (3 tools)

Storage (2 tools)

Tabs (4 tools)

Output (1 tool)

Dialogs (1 tool)

Browser State (1 tool)

Diagnostics (3 tools)

Usage with Agent

Combining with Other Toolkits

Testing