Comparison

FlowDeck vs XcodeBuildMCP

Both give an AI agent a way to drive Xcode. XcodeBuildMCP does it as a Node.js MCP server with ~76 tools and a Sentry SDK in-process. FlowDeck does it as a native Swift CLI with a TUI and an editor extension, zero telemetry, one versioned NDJSON contract for agents and humans. On Sentry's own benchmark, FlowDeck is significantly faster and significantly less token-heavy on every dimension.

TL;DR

  • XcodeBuildMCP is an MCP server on Node.js. ~76 tools, open source (MIT), owned by Sentry. Ships with @sentry/node initialized on every run unless you set an environment variable to disable it.
  • FlowDeck is a development stack. Native Swift CLI, full interactive TUI (flowdeck i), VS Code / Cursor extension. One versioned NDJSON contract used by agents, the terminal, and the editor. No Node.js. No telemetry. No SDK in your process.
  • FlowDeck is significantly faster and significantly less token-heavy on every aspect, measured against XcodeBuildMCP on Sentry's own evaluation harness. Numbers below.
  • Pick XcodeBuildMCP if open-source license or MCP transport is a hard constraint. Pick FlowDeck for everything else.

FlowDeck vs XcodeBuildMCP, measured

What FlowDeck does to the numbers

−51%
Time per task with FlowDeck
FlowDeck 60.4 s · XBM 147 s
−76%
Tokens per task with FlowDeck
FlowDeck 82.5 K · XBM 453 K
−88%
Cost per task with FlowDeck
FlowDeck $0.12 · XBM $1.27

Measured on Sentry's own public evaluation harness, n=200, Codex agent. Lower is better; deltas are FlowDeck vs XcodeBuildMCP v2.

The benchmark, on Sentry's own harness

Sentry published an evaluation harness to compare their MCP server against direct xcodebuild calls. We ran FlowDeck on the same harness, with the same task (shell_primed), the same agent (Codex), and a comparable number of trials. The methodology is theirs, the numbers are public, and the eval repo is on GitHub.

Metric xcodebuild CLI1 xcodebuildmcp v22 FlowDeck + skill3
Task success rate (higher is better) 99.56% 100.00% (n=225) 100.00% (n=200), tied
Median time per task (lower is better) 123 s 147 s 60.4 s, FlowDeck wins (−51% vs XBM)
Tokens per task, avg (lower is better) 341 K 453 K 82.5 K, FlowDeck wins (−76% vs XBM)
Cost per task (lower is better) $0.98 $1.27 $0.12, FlowDeck wins (−88% vs XBM)
Real tool errors per task, avg (lower is better) 0.32 0.49 0.07, FlowDeck wins (−86% vs XBM)
Direct agent calls to xcodebuild/xcrun Not reported Not reported 0 across 200/200 runs

1 Sentry's shell_primed numbers from the original benchmark post.
2 Sentry's own xcodebuildmcp v2 update (Feb 18, 2026), n=225.
3 FlowDeck run on the Sentry harness (March 4, 2026): shell_primed smoke task, Codex agent, n=200.

The most telling line is the last. Across 200 trials, the FlowDeck agent never fell back to a raw xcodebuild or xcrun call. It didn't need to, the FlowDeck CLI surface was sufficient for every step of every task. With the MCP path, the agent occasionally bypassed the server to call Apple's CLIs directly because the MCP tool either failed or didn't cover the case. That's the abstraction leaking under load.

The rest of this article explains why the numbers come out this way.

What is XcodeBuildMCP?

XcodeBuildMCP is a Node.js implementation of an MCP (Model Context Protocol) server that wraps Apple's CLI tools, xcodebuild, xcrun simctl, xcrun devicectl, log stream, and friends, into structured tool calls an agent can invoke. It was open-sourced as a community project, then acquired by Sentry, and is the most popular MCP server for iOS development today.

The surface is broad. The current generation exposes roughly 76 tools across workflow groups: simulator management, device management, build, test, log streaming, project introspection, UI automation (via a bundled AXe binary), and more. An agent connected to the MCP server sees those tools as callable functions; the server translates each into the corresponding Apple-CLI invocation and returns a structured response. A standalone Node.js CLI ships alongside the server for use outside of MCP, which is the comparison point with FlowDeck.

Both tools can run from a shell. Both can run from an agent. The differences live in the runtime, the transport, the output contract, and what's bundled inside.

What XcodeBuildMCP does well

Credit where it's due, before we get to the gap.

It's the canonical open-source MCP server for iOS

MIT-licensed, on a popular repo, actively maintained. If your team has standardized on MCP as the agent-tool protocol, Claude Desktop, certain IDE integrations, a custom orchestrator that speaks only MCP, XcodeBuildMCP fits without modification. The contract is the open Anthropic MCP spec, and any compliant client can connect.

Broad tool surface

Seventy-six tools is a lot. Many wrap obscure xcrun subcommands that simply expose the underlying Apple capability. If the operation has an Apple CLI for it, XcodeBuildMCP probably has a tool for it.

Sentry-backed maintenance

The acquisition gave the project a stable home, dedicated maintainers, and a public roadmap. New Xcode releases are picked up quickly.

None of that is the question. The question is whether the architectural choices, MCP transport, Node.js runtime, broad tool surface, bundled error-reporting SDK, produce the right trade-offs for the specific job of "give an AI agent reliable, low-friction control of Xcode." The benchmark says no.

Where the seams show

MCP is a transport tax

Every MCP tool call is an out-of-process message: the agent serializes a JSON-RPC request, the server deserializes it, executes the underlying Apple CLI, serializes the response, and the agent deserializes it on the way back. For one call, the overhead is small. For an agent loop making fifty calls to build, run, test, and iterate, the overhead compounds, and so does the context window each tool definition consumes. That's where the +33% token gap comes from in the benchmark above. MCP vs CLI for AI-powered iOS development goes deeper.

Tool sprawl eats context before the agent writes a line

Each MCP tool definition costs roughly 550 to 1,400 tokens of context, depending on the schema. With seventy-plus tools registered, an agent pays a tens-of-thousands-of-tokens entry fee before it starts working. Some clients register only a subset, but the default install is the default; in the benchmark, the budget is gone before "Hello, World" gets compiled.

Node.js runtime

XcodeBuildMCP runs on Node.js 18+. That means a Node install, an npm or Homebrew-bundled runtime, and the dependency tree that comes with both. On a clean CI image, that's another setup step. On a developer's machine, it's another version-manager interaction (nvm, fnm, asdf, pick your poison). FlowDeck is a single native Swift binary; the install is a curl | sh and there's nothing else to keep aligned.

No published, versioned output contract

MCP responses are tool-defined; there is no documented, versioned schema for "build output" or "test results" the way FlowDeck publishes for --json. An agent or CI parser that reads XcodeBuildMCP responses is coupled to whatever shape this version of this tool returns. FlowDeck's NDJSON is documented, versioned, and stable across minor releases.

Sentry telemetry, on by default

XcodeBuildMCP bundles @sentry/node and initializes it on every run. Internal crashes and runtime faults are sent to Sentry's servers. The opt-out is XCODEBUILDMCP_SENTRY_DISABLED=true, documented in docs/PRIVACY.md, not the README. File paths are scrubbed before send (per their docs), but the SDK is still in your process. FlowDeck ships no error-reporting SDK at all.

Their CLI is MCP wearing a hat

XcodeBuildMCP ships a CLI too, and Sentry markets it prominently, but the architecture is MCP-first with a CLI layered on top. The repo, the npm package, and the Homebrew tap are all named xcodebuildmcp. The binary is xcodebuildmcp with mcp as a subcommand: xcodebuildmcp mcp starts the server. The CLI contains the MCP server, not the other way around. Concrete consequences:

  • Tool names diverge between surfaces. The tool manifests carry separate names.mcp and names.cli fields. Running tests on a simulator is the MCP tool test_sim and the CLI command test. Same operation, two vocabularies. An agent that learns one doesn't read the other; documentation has to maintain both.
  • The CLI requires a per-workspace daemon for stateful operations like log capture and debug attach. src/cli/commands/daemon.ts, daemon-client.ts, and the daemon race-condition tests are all there. FlowDeck's CLI is stateless beyond the per-project config file, no daemon, no background process, no race conditions to test for.
  • The CLI is still Node.js. Even with Homebrew, the bottle bundles Node 18+. There is no path to a pure native binary on XcodeBuildMCP. FlowDeck installs a single Swift binary with curl | sh.
  • Two parallel skill files. The README distinguishes the "MCP Skill" (described as "optional when using the MCP server") from the "CLI Skill" ("recommended when using the CLI"). Two priming surfaces means two places to update when tool behavior changes, and two stories for an agent to keep straight.
  • CLI parity is still landing. Recent CHANGELOG entries: "CLI build and test commands now show live progress while they are running instead of waiting until the command finishes", i.e., until that fix, the CLI buffered output until commands completed. "CLI now auto-fills tool arguments from session defaults", before that, every CLI call repeated --scheme, --project-path, etc. Fixes for missing flags on list-schemes, false compiler errors in test summaries, and Xcode IDE tools failing under CLI invocation. All evidence of a CLI ramping up to feature parity, not designed for parity from day one.

FlowDeck was a CLI before it was anything else. Its TUI, its editor extension, and any future agent integrations consume the same Swift core through the same NDJSON contract. There is no second-class path.

No editor extension, no TUI

XcodeBuildMCP serves agents and developers willing to drive a Node CLI. There is no full-screen interactive TUI for daily keyboard-driven use, no SourceKit-LSP integration, no Test Explorer wiring for VS Code. The agent gets a great experience; the human ends up back in Xcode for the rest. FlowDeck ships all three in one stack.

Project parsing shells out

To enumerate schemes, XcodeBuildMCP calls xcodebuild -list and parses its text output. That's the 2-5 second shell-out we covered in the xcodebuild comparison. FlowDeck has a native Xcode project parser that reads the project files directly, returning results in milliseconds.

Test discovery is internal, not first-class

XcodeBuildMCP has a Swift source scanner (swift-test-discovery.ts) that detects XCTestCase methods and Swift Testing @Test attributes via regex and brace tracking. It's used internally to resolve -only-testing: selectors before a run, useful, but not exposed as a standalone tool an agent can call to enumerate the test surface without invoking the test runner. The parser is also a text scanner, not a real Swift AST: it can drift on macros, complex generics, and unusual file layouts. FlowDeck exposes flowdeck test discover as a first-class command with AST-based parsing, returns the full test tree as structured JSON, and requires no build to do it.

Each seam is small in isolation. Together, they produce the gap the benchmark measures: with XcodeBuildMCP, the agent takes more time, burns more tokens, costs more, hits more tool errors, and occasionally bypasses the MCP server to call xcodebuild directly because it doesn't trust the tool. With FlowDeck, none of those things happen.

How FlowDeck addresses each seam

Transport
Direct CLI. No MCP server, no JSON-RPC, no out-of-process message bus. An agent calls flowdeck build --json the same way it calls any other shell command, and reads the NDJSON events directly from stdout.
Context budget
One tool definition (the CLI), plus a compact skill that teaches the agent the surface. Tens of thousands of tokens saved before any work begins.
Runtime
Native Swift, single binary. Install with curl | sh. No Node.js, no npm, no version manager to keep aligned with whichever Xcode you have installed.
Output contract
Versioned NDJSON, one event per line, schema documented and stable. Build events, test events, log events, UI tree events, same shape across the CLI. Agents and CI parsers can rely on it.
Telemetry
None. No error-reporting SDK. No diagnostic uploads. The only network call FlowDeck makes is the license check, which sends a license key and nothing else.
Stack coverage
Same CLI for agents and humans. Full TUI (flowdeck i) for keyboard-driven daily use. VS Code / Cursor extension with 53+ commands, SourceKit-LSP integration, Test Explorer wiring, and direct access to OSLog streams. One tool, three surfaces.
Project parsing
Native Xcode project parser. flowdeck context returns instantly with workspaces, schemes, configurations, available simulators, and connected devices in one structured payload.
Test discovery
flowdeck test discover --json uses AST parsing, no build required.

A note on privacy

XcodeBuildMCP ships with Sentry's error-reporting SDK baked in. Their documentation says build errors stay local, and we have no reason to dispute that. But the SDK runs in your process on every invocation unless you set the opt-out environment variable. For most developers that's fine. For teams in regulated industries, for solo developers paranoid about data leaving the machine, and for builds on managed CI runners where you don't fully control the environment, it's a footnote that matters.

Privacy XcodeBuildMCP FlowDeck
Error-tracking SDK @sentry/node ships in every install. Internal crashes and runtime faults sent to Sentry servers. None
Telemetry default On by default. initSentry() runs on every CLI/MCP startup unless disabled. None
Opt out Set XCODEBUILDMCP_SENTRY_DISABLED=true. Documented in docs/PRIVACY.md. Nothing to opt out of
Network calls Sentry SDK connects to Sentry servers on error events. File paths scrubbed before send. License check only: no diagnostic data sent
Your build data Docs say build errors stay local. The SDK is still in-process. No SDK, no reporter, no pipeline

Sentry's business is collecting application data. That's not a value judgment, it's their stated product. FlowDeck's business is a developer tool with a license. The data posture follows from the business model. If telemetry-in-your-build-tool is a no-go for your team, the comparison ends here.

When XcodeBuildMCP is the right answer

Two scenarios, both narrow.

Open source is a hard constraint.
If license posture forbids commercial tooling, XcodeBuildMCP is MIT-licensed and FlowDeck is not. The comparison ends there.
Your agent platform requires MCP transport.
Some clients speak MCP natively and treat MCP servers as first-class. If your stack is locked to that transport, XcodeBuildMCP is the canonical option. (Most modern coding agents, Claude Code, Codex, Cursor, call shell commands directly, so this constraint is less common than it sounds.)

If neither of those applies, FlowDeck wins on every measurable dimension above.

Annex

Quick reference

Capability XcodeBuildMCP FlowDeck
Runtime Node.js 18+ Native Swift
Install npm install or Homebrew (bundles Node.js) curl | sh; single binary
Built for MCP agents (CLI added later, still catching up) Agents and humans, equal-class from day one
Project parsing Shells out to xcodebuild -list Native parser: 10× faster
Test discovery Internal regex scanner; not exposed as a tool First-class command, AST-based, no build
Output format MCP responses; no published contract Versioned NDJSON
UI automation Bundled AXe binary (separate tool) Built in
CI automation CLI; requires Node.js runtime Bash scripts: no dependencies
Editor extension None VS Code / Cursor: 53+ commands
Interactive TUI None flowdeck i
Telemetry Sentry SDK, opt-out None
License Open source (MIT) Commercial ($59/year)
Ownership Sentry Independent
Support GitHub Issues < 24h SLA for teams

FAQ

Can I use FlowDeck with an MCP-only agent?

Yes, via a thin MCP shim that exposes FlowDeck's CLI as tools. But the point of FlowDeck's design is that you don't need MCP, most agent platforms (Claude Code, Codex, Cursor) call shell commands natively, and FlowDeck's NDJSON is the contract. If your platform requires MCP transport, XcodeBuildMCP fits more naturally.

Is FlowDeck's benchmark independently verifiable?

Yes. The harness is Sentry's public eval repo. The numbers above come from running their shell_primed task on FlowDeck with Codex, n=200. The methodology is the same one Sentry used to publish their own xcodebuildmcp numbers.

Why is FlowDeck cheaper per task?

Two reasons. First, the context budget: FlowDeck doesn't register seventy tool definitions, so the agent's working context starts smaller. Second, the agent loop has fewer round trips because FlowDeck's commands cover more in one call (build + install + launch = flowdeck run, not three separate tool invocations).

Does FlowDeck speak MCP?

FlowDeck does not ship a built-in MCP server today. The architecture is CLI-first because that's where the cost-per-task numbers land. For agent platforms that require MCP, you can wrap FlowDeck's CLI in a small MCP shim, but the design bias is toward direct shell calls.

Can I disable XcodeBuildMCP's Sentry SDK?

Yes, by setting XCODEBUILDMCP_SENTRY_DISABLED=true in the environment. Document this in your team's environment setup if you adopt the tool. FlowDeck has no equivalent setting because there's no SDK to disable.

Which one is better for CI?

For agent-driven CI, FlowDeck, direct shell calls, no Node.js install step, NDJSON the runner can parse without piping. For human-written CI that already uses xcodebuild directly, neither is strictly necessary; the upside is mostly on the agent-loop side, not the deterministic-script side.

Is FlowDeck a fork of XcodeBuildMCP?

No. FlowDeck is an independent native Swift project. The two tools share a goal (give agents reliable access to the Apple toolchain) and a few inevitable surface choices (both wrap xcodebuild, both target the simulator and device lifecycle), but the codebase, runtime, and architecture are unrelated.

Where to next