Skip to main content
The BrowserToolkit gives agents full browser automation via Playwright CDP mode. It drives Chrome directly through the Chrome DevTools Protocol — native async, role-based element refs (e1, e2, e3), AI-friendly errors, and self-healing connections.

Installation

pip install 'definable[browser]'
playwright install chromium

Quick Start

from definable.agent import Agent
from definable.browser import BrowserToolkit, BrowserConfig

async def main():
  config = BrowserConfig(headless=False)
  async with BrowserToolkit(config=config) as toolkit:
    agent = Agent(model="openai/gpt-4o", toolkits=[toolkit])
    result = await agent.arun("Go to news.ycombinator.com and list the top 3 stories")
    print(result.content)
The toolkit exposes 55 tools that the agent can call to navigate, read, interact with, and extract data from web pages.

Connection Modes

BrowserToolkit supports three connection modes, selected by the BrowserConfig you pass in.

Fresh Chrome (Default)

Launches a new ephemeral Chrome instance. Cookies and storage are discarded when the toolkit shuts down.
config = BrowserConfig(headless=False)

Persistent Profile

Launch Chrome with a persistent user data directory. Cookies, localStorage, and logged-in sessions survive between runs.
config = BrowserConfig(user_data_dir="/tmp/my-profile")

CDP Attach

Attach to an already-running Chrome instance via its remote debugging port. No new browser window is opened — the agent controls your existing browser.
# First, launch Chrome with remote debugging enabled:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 --no-first-run
config = BrowserConfig(cdp_url="http://127.0.0.1:9222")

BrowserConfig Reference

cdp_url
str
WebSocket or HTTP URL of an existing Chrome CDP endpoint. When set, no new browser is launched.
headless
bool
default:"false"
Run Chrome without a visible window.
user_data_dir
str
Path to a Chrome user data directory for session persistence.
stealth
bool
default:"true"
Enable anti-detection flags (--disable-blink-features=AutomationControlled).
no_sandbox
bool
default:"false"
Disable Chrome sandbox. Required in Docker/CI environments.
proxy
str
Proxy server in "host:port" or "user:pass@host:port" format.
user_agent
str
Override the browser’s User-Agent string.
locale
str
default:"en-US"
Browser locale code, e.g. "en-US", "fr", "zh-CN".
timezone
str
Browser timezone override, e.g. "America/New_York".
viewport_width
int
default:"1280"
Browser viewport width in pixels.
viewport_height
int
default:"720"
Browser viewport height in pixels.
timeout
float
default:"30.0"
Default per-operation timeout in seconds.
executable_path
str
Path to Chrome/Brave/Edge binary. Auto-detected if not set.
extra_args
tuple[str, ...]
Additional Chrome CLI flags.

Element Refs — The Key Innovation

Agents interact with semantic element refs instead of brittle CSS selectors. After calling browser_snapshot(), every interactive element gets a ref like e1, e2, e3:
- heading "Login" [level=1]
- textbox "Email" [ref=e1]
- textbox "Password" [ref=e2]
- button "Sign In" [ref=e3]
[3 refs, 3 interactive]
Then the agent uses refs for all interactions:
browser_type("e1", "[email protected]")  → Typed into e1
browser_type("e2", "password123")       → Typed into e2
browser_click("e3")                     → Clicked: e3
CSS selectors still work everywhere — browser_click("button.submit") is also valid. The ref system auto-detects which you’re using.

Tools Reference

All 55 tools are grouped by category below. Every tool name is prefixed with browser_.
ToolDescription
browser_navigateNavigate to a URL (must include scheme, e.g. https://)
browser_go_backNavigate to the previous page in history
browser_go_forwardNavigate forward in history
browser_refreshReload the current page

Page State (7 tools)

ToolDescription
browser_get_urlReturn the current page URL
browser_get_titleReturn the current page title
browser_get_textReturn visible text of an element (default: body)
browser_get_sourceReturn page HTML source (capped at 20,000 chars)
browser_get_attributeReturn an HTML attribute value on an element
browser_is_visibleCheck if an element is currently visible
browser_get_page_infoSituational snapshot: URL, title, scroll position, element counts

Perception (2 tools)

ToolDescription
browser_snapshotAccessibility-tree view with role-based refs (e1, e2, e3) for every interactive element
browser_screenshotTake a screenshot and save to a file

Interaction (15 tools)

ToolDescription
browser_clickClick an element by ref or CSS selector
browser_click_if_visibleClick only if the element is visible (safe for popups)
browser_click_by_textClick the first element whose visible text contains the given string
browser_typeClear an input and type text into it
browser_type_slowlyType with human-like 75ms delays (avoids bot detection)
browser_press_keysSend keystrokes to a specific element (requires selector)
browser_press_keyPress a key on the focused element (no selector needed)
browser_clear_inputClear an input field or textarea
browser_select_optionSelect an option from a <select> dropdown by visible text
browser_hoverHover over an element (reveals dropdowns, tooltips)
browser_dragDrag one element to another via Playwright native drag
browser_fill_formBatch fill a form: [{ref, type, value}, ...]
browser_set_valueSet an element’s value directly (works for sliders, range inputs)
browser_set_input_filesSet files on a file input element
browser_execute_jsExecute JavaScript in the page context

Scrolling (3 tools)

ToolDescription
browser_scroll_downScroll down by N screen-heights (default 3)
browser_scroll_upScroll up by N screen-heights (default 3)
browser_scroll_toScroll until an element is in view

Waiting (4 tools)

ToolDescription
browser_waitPause for N seconds (use after page loads)
browser_wait_for_elementWait up to N seconds for an element to appear
browser_wait_for_textWait up to N seconds for text to appear inside a selector
browser_wait_forUnified wait: text, text_gone, selector, url, load_state, or JS function

DOM Manipulation (2 tools)

ToolDescription
browser_highlightFlash a gold border around an element for 2 seconds
browser_remove_elementsRemove all elements matching a selector from the DOM

Forms & Checkboxes (3 tools)

ToolDescription
browser_is_checkedCheck if a checkbox or radio is checked
browser_checkCheck a checkbox or radio (idempotent)
browser_uncheckUncheck a checkbox (idempotent)

Cookies (3 tools)

ToolDescription
browser_get_cookiesReturn all cookies as a JSON array
browser_set_cookieSet a cookie on the current domain
browser_clear_cookiesDelete all cookies for the session

Storage (2 tools)

ToolDescription
browser_get_storageGet a value from localStorage or sessionStorage
browser_set_storageSet a key/value pair in localStorage or sessionStorage

Tabs (4 tools)

ToolDescription
browser_open_tabOpen a new tab, optionally navigating to a URL
browser_close_tabClose the currently active tab
browser_get_tabsReturn the number of open tabs
browser_switch_to_tabSwitch to a tab by zero-based index

Output (1 tool)

ToolDescription
browser_print_to_pdfSave the current page as PDF

Dialogs (1 tool)

ToolDescription
browser_handle_dialogAccept or dismiss a browser dialog (alert/confirm/prompt)

Browser State (1 tool)

ToolDescription
browser_set_geolocationOverride GPS coordinates via CDP

Diagnostics (3 tools)

ToolDescription
browser_get_consoleReturn captured browser console messages (with level filter)
browser_get_errorsReturn captured browser page errors
browser_get_networkReturn captured network requests (with URL filter)

Usage with Agent

The toolkit follows the AsyncLifecycleToolkit protocol. Use async with for automatic startup and shutdown:
from definable.agent import Agent
from definable.browser import BrowserToolkit, BrowserConfig
from definable.model.openai import OpenAIChat

async def run():
  config = BrowserConfig(headless=False)
  async with BrowserToolkit(config=config) as toolkit:
    agent = Agent(
      model=OpenAIChat(id="gpt-4o"),
      toolkits=[toolkit],
      instructions="You are a web research assistant. Use browser_snapshot before interacting with any page.",
    )
    result = await agent.arun("Go to example.com and tell me the page title")
    print(result.content)
Use browser_snapshot before interacting with a page. It returns an accessibility-tree view with role-based refs (e1, e2, e3) for every interactive element, which is more useful than browser_get_source for understanding page structure.

Combining with Other Toolkits

BrowserToolkit can be used alongside other toolkits:
from definable.mcp import MCPToolkit, MCPConfig

async with BrowserToolkit(config=config) as browser:
  async with MCPToolkit(config=MCPConfig(...)) as mcp:
    agent = Agent(
      model="openai/gpt-4o",
      toolkits=[browser, mcp],
    )

Testing

Inject a mock browser to test without launching Chrome:
from unittest.mock import AsyncMock
from definable.browser import BrowserToolkit

mock_browser = AsyncMock()
mock_browser.navigate.return_value = "Navigated to https://example.com | Title: Example"

toolkit = BrowserToolkit(browser=mock_browser)
await toolkit.initialize()

assert len(toolkit.tools) == 55