The BrowserToolkit gives agents full browser automation via Playwright CDP mode. It drives Chrome directly through the Chrome DevTools Protocol — native async, role-based element refs (e1, e2, e3), AI-friendly errors, and self-healing connections.
Installation
pip install 'definable[browser]'
playwright install chromium
Quick Start
from definable.agent import Agent
from definable.browser import BrowserToolkit, BrowserConfig
async def main():
config = BrowserConfig(headless=False)
async with BrowserToolkit(config=config) as toolkit:
agent = Agent(model="openai/gpt-4o", toolkits=[toolkit])
result = await agent.arun("Go to news.ycombinator.com and list the top 3 stories")
print(result.content)
The toolkit exposes 55 tools that the agent can call to navigate, read, interact with, and extract data from web pages.
Connection Modes
BrowserToolkit supports three connection modes, selected by the BrowserConfig you pass in.
Fresh Chrome (Default)
Launches a new ephemeral Chrome instance. Cookies and storage are discarded when the toolkit shuts down.
config = BrowserConfig(headless=False)
Persistent Profile
Launch Chrome with a persistent user data directory. Cookies, localStorage, and logged-in sessions survive between runs.
config = BrowserConfig(user_data_dir="/tmp/my-profile")
CDP Attach
Attach to an already-running Chrome instance via its remote debugging port. No new browser window is opened — the agent controls your existing browser.
# First, launch Chrome with remote debugging enabled:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 --no-first-run
config = BrowserConfig(cdp_url="http://127.0.0.1:9222")
BrowserConfig Reference
WebSocket or HTTP URL of an existing Chrome CDP endpoint. When set, no new browser is launched.
Run Chrome without a visible window.
Path to a Chrome user data directory for session persistence.
Enable anti-detection flags (--disable-blink-features=AutomationControlled).
Disable Chrome sandbox. Required in Docker/CI environments.
Proxy server in "host:port" or "user:pass@host:port" format.
Override the browser’s User-Agent string.
Browser locale code, e.g. "en-US", "fr", "zh-CN".
Browser timezone override, e.g. "America/New_York".
Browser viewport width in pixels.
Browser viewport height in pixels.
Default per-operation timeout in seconds.
Path to Chrome/Brave/Edge binary. Auto-detected if not set.
Additional Chrome CLI flags.
Element Refs — The Key Innovation
Agents interact with semantic element refs instead of brittle CSS selectors. After calling browser_snapshot(), every interactive element gets a ref like e1, e2, e3:
- heading "Login" [level=1]
- textbox "Email" [ref=e1]
- textbox "Password" [ref=e2]
- button "Sign In" [ref=e3]
[3 refs, 3 interactive]
Then the agent uses refs for all interactions:
browser_type("e1", "[email protected]") → Typed into e1
browser_type("e2", "password123") → Typed into e2
browser_click("e3") → Clicked: e3
CSS selectors still work everywhere — browser_click("button.submit") is also valid. The ref system auto-detects which you’re using.
All 55 tools are grouped by category below. Every tool name is prefixed with browser_.
| Tool | Description |
|---|
browser_navigate | Navigate to a URL (must include scheme, e.g. https://) |
browser_go_back | Navigate to the previous page in history |
browser_go_forward | Navigate forward in history |
browser_refresh | Reload the current page |
Page State (7 tools)
| Tool | Description |
|---|
browser_get_url | Return the current page URL |
browser_get_title | Return the current page title |
browser_get_text | Return visible text of an element (default: body) |
browser_get_source | Return page HTML source (capped at 20,000 chars) |
browser_get_attribute | Return an HTML attribute value on an element |
browser_is_visible | Check if an element is currently visible |
browser_get_page_info | Situational snapshot: URL, title, scroll position, element counts |
| Tool | Description |
|---|
browser_snapshot | Accessibility-tree view with role-based refs (e1, e2, e3) for every interactive element |
browser_screenshot | Take a screenshot and save to a file |
| Tool | Description |
|---|
browser_click | Click an element by ref or CSS selector |
browser_click_if_visible | Click only if the element is visible (safe for popups) |
browser_click_by_text | Click the first element whose visible text contains the given string |
browser_type | Clear an input and type text into it |
browser_type_slowly | Type with human-like 75ms delays (avoids bot detection) |
browser_press_keys | Send keystrokes to a specific element (requires selector) |
browser_press_key | Press a key on the focused element (no selector needed) |
browser_clear_input | Clear an input field or textarea |
browser_select_option | Select an option from a <select> dropdown by visible text |
browser_hover | Hover over an element (reveals dropdowns, tooltips) |
browser_drag | Drag one element to another via Playwright native drag |
browser_fill_form | Batch fill a form: [{ref, type, value}, ...] |
browser_set_value | Set an element’s value directly (works for sliders, range inputs) |
browser_set_input_files | Set files on a file input element |
browser_execute_js | Execute JavaScript in the page context |
| Tool | Description |
|---|
browser_scroll_down | Scroll down by N screen-heights (default 3) |
browser_scroll_up | Scroll up by N screen-heights (default 3) |
browser_scroll_to | Scroll until an element is in view |
| Tool | Description |
|---|
browser_wait | Pause for N seconds (use after page loads) |
browser_wait_for_element | Wait up to N seconds for an element to appear |
browser_wait_for_text | Wait up to N seconds for text to appear inside a selector |
browser_wait_for | Unified wait: text, text_gone, selector, url, load_state, or JS function |
| Tool | Description |
|---|
browser_highlight | Flash a gold border around an element for 2 seconds |
browser_remove_elements | Remove all elements matching a selector from the DOM |
| Tool | Description |
|---|
browser_is_checked | Check if a checkbox or radio is checked |
browser_check | Check a checkbox or radio (idempotent) |
browser_uncheck | Uncheck a checkbox (idempotent) |
| Tool | Description |
|---|
browser_get_cookies | Return all cookies as a JSON array |
browser_set_cookie | Set a cookie on the current domain |
browser_clear_cookies | Delete all cookies for the session |
| Tool | Description |
|---|
browser_get_storage | Get a value from localStorage or sessionStorage |
browser_set_storage | Set a key/value pair in localStorage or sessionStorage |
| Tool | Description |
|---|
browser_open_tab | Open a new tab, optionally navigating to a URL |
browser_close_tab | Close the currently active tab |
browser_get_tabs | Return the number of open tabs |
browser_switch_to_tab | Switch to a tab by zero-based index |
| Tool | Description |
|---|
browser_print_to_pdf | Save the current page as PDF |
| Tool | Description |
|---|
browser_handle_dialog | Accept or dismiss a browser dialog (alert/confirm/prompt) |
| Tool | Description |
|---|
browser_set_geolocation | Override GPS coordinates via CDP |
| Tool | Description |
|---|
browser_get_console | Return captured browser console messages (with level filter) |
browser_get_errors | Return captured browser page errors |
browser_get_network | Return captured network requests (with URL filter) |
Usage with Agent
The toolkit follows the AsyncLifecycleToolkit protocol. Use async with for automatic startup and shutdown:
from definable.agent import Agent
from definable.browser import BrowserToolkit, BrowserConfig
from definable.model.openai import OpenAIChat
async def run():
config = BrowserConfig(headless=False)
async with BrowserToolkit(config=config) as toolkit:
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
toolkits=[toolkit],
instructions="You are a web research assistant. Use browser_snapshot before interacting with any page.",
)
result = await agent.arun("Go to example.com and tell me the page title")
print(result.content)
Use browser_snapshot before interacting with a page. It returns an accessibility-tree view with role-based refs (e1, e2, e3) for every interactive element, which is more useful than browser_get_source for understanding page structure.
BrowserToolkit can be used alongside other toolkits:
from definable.mcp import MCPToolkit, MCPConfig
async with BrowserToolkit(config=config) as browser:
async with MCPToolkit(config=MCPConfig(...)) as mcp:
agent = Agent(
model="openai/gpt-4o",
toolkits=[browser, mcp],
)
Testing
Inject a mock browser to test without launching Chrome:
from unittest.mock import AsyncMock
from definable.browser import BrowserToolkit
mock_browser = AsyncMock()
mock_browser.navigate.return_value = "Navigated to https://example.com | Title: Example"
toolkit = BrowserToolkit(browser=mock_browser)
await toolkit.initialize()
assert len(toolkit.tools) == 55