Skip to main content
The MacOS skill lets agents control a Mac like a human: taking screenshots, clicking and typing, opening apps, managing files, and reading system state. It communicates with the Definable Desktop Bridge — a lightweight Swift app that exposes macOS capabilities over a local HTTP API.
This skill executes real macOS actions. Always use allowed_apps or blocked_apps in production to limit exposure. Keep the bridge bound to 127.0.0.1 (default) — never expose it to external networks.

Setup

1. Build and run the Desktop Bridge

cd definable/desktop-bridge
swift build -c release
.build/release/DesktopBridge
On first launch the bridge:
  • Generates a random auth token and writes it to ~/.definable/bridge-token (chmod 600)
  • Checks Accessibility and Screen Recording permissions
  • Listens on http://127.0.0.1:7777
Definable Desktop Bridge v1.0.0
  URL:   http://127.0.0.1:7777
  Token: written to ~/.definable/bridge-token
  ⚠  Accessibility: DENIED  — input simulation will fail
  ✓  Screen Recording: GRANTED
Grant permissions in System Settings → Privacy & Security when prompted.

2. Install the Python package

pip install 'definable[desktop]'
The desktop extra adds websockets for the optional DesktopInterface. The bridge client uses httpx, which is already a core dependency.

Quick Start

import asyncio
import os
from definable.agent import Agent
from definable.model.openai import OpenAIChat
from definable.skill.builtin.macos import MacOS

async def main():
  agent = Agent(
    model=OpenAIChat(id="gpt-4o", api_key=os.environ["OPENAI_API_KEY"]),
    skills=[MacOS()],
    instructions="Take a screenshot before every action to understand the current state.",
  )
  output = await agent.arun("Open Safari and go to apple.com")
  print(output.content)

asyncio.run(main())
The skill reads ~/.definable/bridge-token automatically — no token configuration required.

Constructor Parameters

from definable.skill.builtin.macos import MacOS

skill = MacOS(
  bridge_host="127.0.0.1",      # Bridge hostname
  bridge_port=7777,              # Bridge port
  bridge_token=None,             # None → reads ~/.definable/bridge-token
  allowed_apps=None,             # Set[str] allowlist; None = no restriction
  blocked_apps=set(),            # Set[str] blocklist
  enable_applescript=True,       # Expose run_applescript tool
  enable_file_write=True,        # Expose write_file and move_file tools
  enable_input=True,             # Expose mouse/keyboard tools
)
bridge_host
str
default:"\"127.0.0.1\""
Bridge hostname. Change only if the bridge runs on a different host.
bridge_port
int
default:7777
Bridge port.
bridge_token
Optional[str]
default:"None"
Bearer token for bridge authentication. If None, automatically reads from ~/.definable/bridge-token.
allowed_apps
Optional[Set[str]]
default:"None"
App allowlist. When set, only app names in this set can be targeted by tools. Apps in both allowed_apps and blocked_apps are blocked (blocked takes precedence).
blocked_apps
Set[str]
default:"set()"
App blocklist. App names in this set are always rejected, regardless of allowed_apps.
enable_applescript
bool
default:true
Expose the run_applescript tool. Disable when scripting access is not needed.
enable_file_write
bool
default:true
Expose write_file and move_file tools. read_file and list_files are always available.
enable_input
bool
default:true
Expose input simulation tools: click, type_text, press_key, scroll, drag, set_clipboard, click_element, set_element_value.

Tools Reference

Screen (always available)

ToolDescription
screenshotCapture the screen. Returns a data:image/png;base64,... URI that vision models interpret directly.
read_screenOCR the screen (or a region) and return the visible text.
find_text_on_screenLocate text on screen and return its coordinates.

Input (requires enable_input=True)

ToolDescription
clickClick at coordinates (x, y) with optional button and click count.
type_textType text using keyboard events.
press_keyPress a key with optional modifiers (cmd, shift, ctrl, alt).
scrollScroll at coordinates with configurable delta.
dragDrag from one coordinate to another.
set_clipboardWrite text to the clipboard.
click_elementClick a UI element identified by app name, role, and title. Preferred over coordinate clicks.
set_element_valueSet a text field’s value via Accessibility API.

Apps (always available)

ToolDescription
list_running_appsList all running applications.
open_appLaunch an app by name. Returns the PID.
quit_appQuit an app (optionally force-quit).
activate_appBring an app to the foreground.
open_urlOpen a URL in the default browser.

Windows (always available)

ToolDescription
list_windowsList open windows (app, title, bounds).
focus_windowFocus a window by title.

Accessibility (always available)

ToolDescription
find_elementFind a UI element by app, role, and title. Returns bounds and attributes.
get_ui_treeReturn the full Accessibility tree for an app.

Files (always readable; write requires enable_file_write=True)

ToolDescription
list_filesList files at a path (optionally recursive).
read_fileRead a file’s text content.
write_fileWrite text to a file. (requires enable_file_write=True)
move_fileMove or rename a file. (requires enable_file_write=True)

Clipboard (always available)

ToolDescription
get_clipboardRead the current clipboard text.
set_clipboardWrite text to the clipboard. (requires enable_input=True)

System (always available)

ToolDescription
system_infoHostname, OS version, CPU, and RAM.
get_batteryBattery level and charging status.
set_volumeSet system volume (0–100).
send_notificationSend a macOS notification banner.

AppleScript (requires enable_applescript=True)

ToolDescription
run_applescriptExecute an AppleScript and return its output.

Tool Counts

ConfigurationTool count
All enabled (default)30
enable_input=False only22
enable_file_write=False only28
enable_applescript=False only29
All disabled (read-only)19

Safety Controls

App Allowlisting

# Only allow Safari and TextEdit — any other app name is rejected by tools
skill = MacOS(allowed_apps={"Safari", "TextEdit"})

App Blocklisting

# Block dangerous apps; allow everything else
skill = MacOS(blocked_apps={"Terminal", "System Preferences"})

Read-Only Mode

# No mouse/keyboard input, no file writes, no AppleScript
skill = MacOS(
  enable_input=False,
  enable_file_write=False,
  enable_applescript=False,
)
When allowed_apps and blocked_apps both contain the same app, blocked_apps wins (security-first).

Required macOS Permissions

Grant these in System Settings → Privacy & Security before using the bridge:
PermissionRequired for
AccessibilityMouse/keyboard input (click, type_text, press_key, scroll, drag) and UI element operations
Screen & System Audio Recordingscreenshot, read_screen, find_text_on_screen
Automation (per-app)run_applescript targeting specific apps (granted on first use)
The /health endpoint reports current permission status:
# Check bridge permissions programmatically
from definable.agent.interface.desktop.bridge_client import BridgeClient

async with BridgeClient() as client:
  health = await client.health()
  print(health["permissions"])
  # {"accessibility": true, "screenRecording": true, "fullDiskAccess": false}

Remote Control via Telegram

Control your Mac remotely using the MacOS skill + Telegram interface:
import asyncio, os
from definable.agent import Agent
from definable.model.openai import OpenAIChat
from definable.agent.interface.telegram import TelegramInterface
from definable.skill.builtin.macos import MacOS

async def main():
  agent = Agent(
    model=OpenAIChat(id="gpt-4o", api_key=os.environ["OPENAI_API_KEY"]),
    skills=[MacOS(allowed_apps={"Safari", "Finder"})],
    instructions="Take a screenshot before and after every action.",
  )
  interface = TelegramInterface(
    agent=agent,
    bot_token=os.environ["TELEGRAM_BOT_TOKEN"],
    allowed_user_ids={int(os.environ["MY_TELEGRAM_USER_ID"])},
  )
  async with interface:
    await interface.serve_forever()

asyncio.run(main())
The MacOS skill works with any Definable interface — Telegram, Discord, or a custom WebSocket frontend.

Using BridgeClient Directly

from definable.agent.interface.desktop.bridge_client import BridgeClient

async def example():
  async with BridgeClient() as client:
    # Take a screenshot
    png = await client.capture_screen()

    # Click at coordinates
    await client.click(x=500, y=300)

    # List running apps
    apps = await client.list_apps()
    print([a.name for a in apps])

    # Read a file
    content = await client.read_file("/etc/hosts")
See definable/definable/interfaces/desktop/bridge_client.py for the full API.

DesktopInterface (Local Chat)

The DesktopInterface provides a local WebSocket server for direct chat without an external messaging platform:
from definable.agent import Agent
from definable.agent.interface.desktop import DesktopInterface
from definable.skill.builtin.macos import MacOS
from definable.model.openai import OpenAIChat
import asyncio, os

async def main():
  agent = Agent(
    model=OpenAIChat(id="gpt-4o", api_key=os.environ["OPENAI_API_KEY"]),
    skills=[MacOS()],
  )
  interface = DesktopInterface(
    agent=agent,
    websocket_port=8765,
  )
  async with interface:
    await interface.serve_forever()

asyncio.run(main())
Connect with any WebSocket client sending {"text": "your message"}.

Bridge API Reference

The bridge exposes a JSON HTTP API on http://127.0.0.1:7777. All endpoints require Authorization: Bearer <token>. See the Desktop Bridge README for the full endpoint reference.