Guide · macOS

Drive macOS apps from the terminal.

Build, run, and automate macOS apps with FlowDeck. Click, type, send hotkeys, and navigate menus from the command line. Works on any running macOS app, not just the one you built.

Before you start

This guide assumes:

  • FlowDeck is installed and the trial or license is active. If not, follow the getting-started guide first.
  • A macOS app project that builds with flowdeck build -D "My Mac". Native macOS or Mac Catalyst both work.
  • Willingness to grant Accessibility and Screen Recording permissions to FlowDeck. macOS UI automation can't function without them.

By the end you'll have built and run a macOS app from the terminal, streamed its logs, captured its UI state as JSON, and driven its interface with clicks, keystrokes, and hotkeys. The patterns apply to any macOS app, including ones whose source code you don't have.

Step 01

Grant macOS permissions to FlowDeck

macOS automation requires two TCC permissions: Accessibility (so FlowDeck can read the accessibility tree of other apps and post synthetic events) and Screen Recording (so it can capture screenshots). Both are one-time grants.

flowdeck ui mac request-permissions

This pops the system prompts. Approve both. If Screen Recording prompts after the first grant, you may need to quit and restart your terminal, macOS caches the permission state per process. After the restart, verify:

flowdeck ui mac list permissions --json

Both should report as granted. If anything stays denied, open System Settings → Privacy & Security → Accessibility (and the matching Screen Recording panel) and confirm your terminal application is checked.

Step 02

Build and run for macOS

Build, install, and launch in one command. The target is "My Mac"; FlowDeck knows that means your local macOS host.

flowdeck build -D "My Mac"
flowdeck run   -D "My Mac"

If your project supports Mac Catalyst alongside iOS, the same -D "My Mac" flag works, FlowDeck resolves the destination based on the scheme's product type. For schemes that build both iOS and macOS, save the macOS target to your project config so you don't have to repeat the flag:

flowdeck config set -D "My Mac"

The output ends with the app's short ID and bundle identifier. Save the ID, you'll use it for log streaming.

Step 03

Stream the app's logs

In a second terminal, stream the running app's logs:

flowdeck apps
flowdeck logs <app-id>

FlowDeck registered the launch and routes the log read through the macOS log pathway automatically. Output is your app's print() calls, OSLog messages, and any runtime errors. No WindowServer or system daemon noise.

For a one-shot launch-and-stream pattern, use flowdeck run --log -D "My Mac".

Step 04

macOS automation has no sandbox

This is the single most important thing to understand before letting an agent drive a macOS app, and the biggest difference from iOS Simulator automation.

On iOS, the agent's input lands inside a sandboxed simulator. You can keep typing in your editor while it runs; the two streams don't touch. On macOS, there's no sandbox. The agent uses your real keyboard, your real mouse, your real focus. While it runs:

  • Your cursor moves on its own.
  • Your keystrokes get diverted into whatever app the agent is driving, including, on slip-ups, into the wrong app if focus jumps.
  • You can't use your computer for anything else.

So the operating mode is: kick the agent off, walk away, come back when it pings. This is closer to "headless run" than "live pairing." Plan tasks accordingly, automation that takes 30 minutes is a coffee break, not a thing you watch.

The upside of having no sandbox: the agent can drive any macOS app, not just yours. It can launch Safari for an OAuth handoff in your app, drive Slack to verify a notification was sent, or open Xcode to inspect a generated project. Things that simply aren't possible in the iOS Simulator.

Prompt with that scope in mind. Example:

Run MyApp, sign in via the Google OAuth flow (which
opens in Safari), come back to MyApp once Safari closes,
and confirm the user's name renders in the header.

The agent handles the cross-app handoff automatically, activate MyApp, click Sign in, switch focus to Safari, complete the OAuth dance, come back to MyApp once it regains focus. None of that is possible on iOS.

Step 05

Focus is global, the activation rule

The flip side of no sandbox is that input goes to whichever app is frontmost at the moment the event fires. If the agent clicks while another app has focus, the click lands in the wrong place. The fix is mechanical: before every interaction sequence, activate the target app first.

The FlowDeck skill pack documents this rule for the agent, once installed (see the AI agents setup guide), the agent automatically calls flowdeck ui mac activate before any click / type / hotkey sequence. The pattern it runs under the hood:

flowdeck ui mac activate --app "MyApp"
flowdeck ui mac click "Save" --app "MyApp"
flowdeck ui mac type "Hello" --app "MyApp"

If you cmd-tab to a different app while the agent is working, the next activate call yanks focus back. Once you understand that's happening, the "walk away" mental model in Step 04 makes more sense, competing with an active automation for focus produces neither good work nor a usable computer.

Between interactions, the agent reads what's on the window the same way it does on iOS:

flowdeck ui mac screen --app "MyApp" --json

That returns a JPEG of the app's frontmost window and the accessibility tree as structured JSON, the same shape flowdeck ui simulator screen --json returns. The agent uses the tree to find elements by label or identifier, and the screenshot when it needs pixel-level context (custom-drawn UI, comparing against a design). For long sessions, flowdeck ui mac session start --app "MyApp" writes a fresh capture every ~500ms so the agent doesn't pay the capture cost on every action.

Step 06

What the agent can and can't do

The macOS automation surface is wide. The agent reads the same accessibility tree macOS itself uses, and drives input through the standard accessibility APIs. That means a lot is possible, and a few things genuinely aren't.

What works

  • Click, double-click, right-click. "Click Save," "right-click the first row." Right-click and context menus are first-class on macOS (no iOS equivalent).
  • Type, with masking. "Type my email." Use --mask for passwords so credentials don't appear in logs.
  • Hotkeys. "Cmd+S to save," "cmd+shift+P to open the command palette." A first-class macOS primitive that you reach for constantly.
  • Menu navigation. "Pick File > Save As." The agent walks the menu bar by label path, faster and more reliable than driving menus with clicks and arrow keys.
  • Window management. "Move the main window to the secondary display," "close every window except the frontmost." Multi-window choreography is possible because macOS apps actually have multiple windows.
  • Cross-app workflows. Drive any installed app, not just yours. OAuth handoffs to Safari, deep-link tests through Mail, Xcode automation, all in scope.
  • Read accessibility trees for native apps. SwiftUI and AppKit apps expose clean, labelled trees. The agent matches elements by label, role, or accessibility identifier.

Where accessibility quality varies

  • Electron and Catalyst apps. Some expose decent AX trees, some are very flat with generic labels. The agent's first attempt may fail to find a labelled element; tell it to fall back to find for inspection, then to coordinates from a screenshot if labels really don't exist.
  • Web content inside native apps. A WKWebView shows up in the tree as a single opaque element. The agent can't reliably read the HTML inside. For these, prompt it to use screenshot+coordinates instead of element matching.
  • Custom-drawn UI. Apps that draw their own controls (audio software, design tools) often have minimal accessibility. The agent falls back to --point "x,y" coordinates pulled from screen --json.

What it can't do

  • Bypass system security prompts. TCC prompts (camera, microphone, location, file access), the FileVault unlock screen, the lock screen, and the password-required Keychain dialog all require a human. The agent can't click through these on your behalf.
  • Use Touch ID or Watch unlock. Biometric prompts pause the flow until you authenticate. The agent will wait, you'll authenticate, then it continues.
  • Drive sandboxed system-level UI. The dock, Spotlight, Notification Center, and Mission Control are special. Some clicks work, many don't. Prefer hotkeys (cmd+space for Spotlight) over chasing those surfaces with the AX tree.
  • Read your screen across desktop spaces simultaneously. The agent sees the space it's currently on. If the target app is on a different desktop space, instruct the agent to activate it (which switches to that space) before reading state.

Patterns that come up most often

Drive a third-party macOS app.
FlowDeck doesn't care whether you built the app. Launch any installed app with flowdeck ui mac launch --bundle-id com.example.app, activate it, drive it. Useful for end-to-end tests that span your app and another (e.g., your app handing off to Safari for OAuth).
Use find before clicking ambiguous labels.
"Settings" might appear in three places on a complex screen. flowdeck ui mac find "Settings" --app "MyApp" --json shows you exactly what elements match, with frames and identifiers. Pick the right one, then click it by --by-id instead of label.
Use coordinates from FlowDeck screenshots, not raw screen coords.
If you need to click a custom-drawn region that has no accessibility identity, read the screenshot from screen --json, find the pixel coordinates, and use --point "x,y". The coordinates in FlowDeck's JSON are in points (not @2x pixels), so no scaling math.
Capture before each action in long sessions.
Or use session start. Either way, never chain more than 2-3 actions without re-reading state. macOS apps are full of asynchronous behavior (loading spinners, modals, accessibility-tree refreshes) that breaks blind action chains.
Keep an eye on focus.
Even with activate, macOS apps can spontaneously yield focus (notifications, system modals, login items launching). If a sequence flakes, the first thing to check is whether the app was actually frontmost when the action fired.

When things go wrong

ui_mac_permissions_error from any UI command
Accessibility or Screen Recording permission isn't granted (or isn't granted to the right process). Run flowdeck ui mac request-permissions and approve. Screen Recording grants may require a terminal restart before they take effect. Verify with flowdeck ui mac list permissions --json.
Click lands on the wrong element
Almost always a label collision. The macOS accessibility tree often has multiple elements with the same label (a menu item and a button with the same name, for instance). Use flowdeck ui mac find "..." first to see what matches, then click by accessibility identifier (--by-id) instead.
Input goes to the wrong app
You forgot to activate before the sequence, or something stole focus mid-script. Restructure the script to activate immediately before each block of interactions. For agent-driven sessions, instruct the agent to call activate as the first step of every loop turn.
Menus don't open
Menus require the app to be frontmost. If you're calling menu click without an explicit activate first, the menu open command fires while a different app holds the menu bar. Always activate before menu interactions.
Coordinates from a screenshot don't click where you expect
You may be passing pixel coords where FlowDeck expects points. FlowDeck JSON output is always in points (the resolution-independent macOS coordinate system). Multiply by the display scale (usually 2x) only if you're working from a raw @2x screenshot you generated outside of FlowDeck.
App didn't launch via flowdeck run -D "My Mac"
For Mac Catalyst apps, the destination string can be ambiguous. Try flowdeck run --device "My Mac (Designed for iPad)" for the Catalyst variant, or set the platform explicitly in your config. Check flowdeck context to see what destinations the scheme actually supports.

Annex

How macOS automation differs from iOS Simulator automation

For reference, the differences between flowdeck ui mac and flowdeck ui simulator.

Aspect iOS Simulator (ui simulator) macOS (ui mac)
Sandbox Drives apps inside the simulator only Drives any running macOS app (yours, third-party, system)
Permissions No extra permission required Accessibility + Screen Recording grants required (one-time)
Focus model Sandboxed; no focus theft Global; activate before every interaction sequence
Coordinates Relative to the simulator window, in points Absolute screen coordinates, in points
Hotkeys Limited (simulator captures hardware keys differently) First-class via hotkey "cmd+s"
Menus No menu-bar concept on iOS menu click "File > Save As..." by label path
Window management Single window per app window list / move / resize / focus for multi-window apps
Right-click No right-click on iOS right-click command + context menu support

The shared primitives (click, type, scroll, wait, assert, screen capture) behave the same way. The macOS-only commands cover the things that don't have iOS equivalents.

Read next

Further reading

  1. Things to ask your AI agent, example prompts that also work for macOS apps.
  2. iOS UI automation deep dive, the accessibility model that underpins both iOS and macOS automation.
  3. Automate macOS builds for CI, the same FlowDeck commands in a CI pipeline.
  4. Full CLI reference, in the FlowDeck docs.