MCP vs CLI for AI-powered iOS development

iOS development requires Apple’s CLI tools. xcodebuild compiles. simctl manages simulators. devicectl handles devices. Every build system for Apple platforms calls them eventually.

AI agents can call them too. The results are bad.

xcodebuild dumps thousands of lines of unstructured text. simctl needs UDIDs that vary across machines. devicectl has completely different syntax for the same operations. The agent guesses flags, fails, retries, burns context. Custom workspace structures, non-standard derived data paths, provisioning setups that only work with certain signing identities. The agent has no way to discover any of this.

That’s why wrappers exist. Something between the agent and Apple’s tools that parses the output and gives the agent structured data.

Two approaches have emerged.

MCP: register tools, let the agent pick

An MCP server registers tools with the agent. Each tool gets a name, description, and JSON parameter schema. The agent sees all definitions in its context window, picks one, calls it through the MCP stdio protocol, reads the response.

For iOS, the most popular MCP server registers 59 tools by default. Build for simulator. Build for device. Screenshot. Log capture. Test. Launch. Stop. Each a separate tool definition.

Each definition costs 550 to 1,400 tokens. With 59 tools, that’s tens of thousands of tokens consumed before the agent writes a single line of code. That context is gone for the entire session whether the agent uses those tools or not.

Tool discovery is automatic. That’s the real strength. The agent can’t miss a tool because it’s in context. The protocol is standardized. Every major AI coding tool supports it.

But the costs compound.

The agent chooses between build_sim, build_and_run, build_device, build_macOS, and spm_build on every step. More tools, more wrong choices. Multi-step reasoning degrades after 3 to 4 sequential MCP calls because each response adds to the context burden. MCP tool responses cap at 25K tokens. A large build log or full test suite exceeds that, forcing truncation or pagination across multiple calls.

CLI: teach the agent, let it execute

A CLI exposes a small number of commands with flags. build, run, test, clean. Simulator vs device vs macOS is a flag, not a separate command. The CLI calls xcodebuild, simctl, and devicectl under the hood. It handles flag resolution, destination matching, and output parsing.

The agent learns the CLI from a skill file. A markdown document with the command reference and workflow patterns. The agent calls commands via bash. No protocol. No runtime. No tool registration.

LLMs already understand terminal commands. They’ve been trained on billions of shell interactions from Stack Overflow, GitHub, and documentation. There’s no schema to inject. Each command costs roughly 200 tokens versus 550 to 1,400 per MCP tool definition.

The numbers are consistent across benchmarks. CLI completes identical tasks with 10x to 32x fewer tokens than MCP. One benchmark found checking a repo’s language cost 1,365 tokens via CLI and 44,026 via MCP. The overhead is almost entirely schema definitions the agent never touches.

The disadvantage: discovery. The agent needs the skill file to know the CLI exists. Without it, the agent falls back to raw xcodebuild. MCP doesn’t have this problem.

The comparison

Context cost. 59 tool schemas sit in context all session. A skill file is one document referenced as needed. Not marginal. Orders of magnitude different.

Tool selection. Five build variants to choose from vs one command with a flag. Fewer commands, fewer wrong choices.

Response limits. MCP caps at 25K tokens per response. Build logs and test suites exceed that. CLI stdout has no limit.

Composability. CLI commands chain. Pipe through jq, redirect to files, combine with grep. MCP tools don’t compose. Each call is isolated.

Human usability. Nobody runs MCP tools from the terminal during development. A CLI works for both humans and agents. Same command, same output, same failure mode.

Reasoning quality. Agents degrade after 3 to 4 sequential MCP tool calls as context fills with tool responses. CLI interactions are lighter. The agent keeps more of its context window for actual problem-solving.

Where MCP fits

Stateless integrations. Databases, APIs, CRMs. Services that don’t have a CLI (Slack, Notion, most SaaS platforms). Independent operations that don’t depend on each other. The tool model maps cleanly and automatic discovery is a genuine advantage.

Where CLI fits

Sequential workflows. Build pipelines. Anything where step two depends on step one. Operations that produce large output. Workflows where a human and an agent use the same tool. Local development where you don’t want a runtime, a daemon, or telemetry between you and your build.

iOS development is all of those things.

Why FlowDeck is a CLI

FlowDeck is a native Swift CLI for iOS, macOS, watchOS, and tvOS development. Eight commands. Structured NDJSON output on every command. Built-in UI automation for simulator interaction. Skill files for Claude Code, Codex, OpenCode, and Gemini CLI.

No Node.js. No MCP protocol. No telemetry. No background process. The agent calls flowdeck build the same way you do.

We chose CLI because a build system is a pipeline, not a bag of independent operations. The agent doesn’t need 59 tools. It needs 8 commands, structured output, and the ability to see the screen.

Q&A

Should I use an MCP server for iOS development?

Probably not. iOS is a sequential build pipeline, not a bag of independent operations.

MCP shines for stateless integrations (Slack, Notion, databases) where automatic tool discovery beats teaching the agent each command.

For iOS, a CLI plus a skill file delivers the same capability with about one-tenth the token cost.

How much context does an MCP server actually cost?

Each MCP tool definition costs roughly 550 to 1,400 tokens of context.

XcodeBuildMCP registers 76 tools by default, which can consume tens of thousands of tokens before the agent writes its first line of code.

Some MCP servers (XcodeBuildMCP included) offer dynamic tool registration to reduce that upfront cost, trading it for discovery latency on each turn.

The exact math depends on your workflow and how many tools the agent actually touches.

Is FlowDeck an MCP server?

No. FlowDeck is a native Swift CLI.

Agents integrate via skill files (Claude Code’s SKILL.md, Codex’s AGENTS.md) that describe the command surface.

No protocol layer, no tool registration, no MCP runtime.

When does MCP work better than a CLI?

When the agent needs to discover tools it doesn’t know exist (stateless integrations with no shell equivalent), when operations are independent and don’t compose, or when the host environment has no terminal access.

Slack, Notion, and most SaaS APIs are good MCP candidates.

Why do CLI commands cost fewer tokens than MCP tools?

Because LLMs already understand terminal commands. They’re trained on billions of shell interactions.

A command costs roughly 200 tokens to teach versus 550 to 1,400 per MCP tool schema.

And CLI output isn’t capped at 25K tokens per response, which matters for build logs and test suites.

How do I migrate an existing Claude Code or Codex setup from MCP to CLI?

First, uninstall the MCP server from your client.

Claude Code: Run /mcp and remove the xcodebuildmcp (or other iOS MCP) entry.

Codex CLI: Open ~/.codex/config.toml, find the [mcp_servers.NAME] block for the iOS MCP server (typically [mcp_servers.xcodebuildmcp]), delete the entire block, save the file, then restart Codex.

Once the MCP server is gone, run flowdeck -i and press A to install the FlowDeck skill across every supported agent in one step.

The skill loads on the agent’s next session start. No code changes. No workflow rewrites. You just stop loading 60+ tool schemas.

Does this work with Cursor, Cline, OpenCode, or Gemini CLI?

Yes for OpenCode and Gemini CLI. Both read the same skill-pack pattern as Claude Code and Codex.

Cursor and Cline currently rely on MCP-style integrations for project tools. You can still call FlowDeck commands manually from their terminal panes, but the skill-pack auto-load only fires in agents that support file-based skills.

Is the token-cost difference still real on agents with prompt caching?

Yes.

Prompt caching reduces the repeated cost of the same context, but the schemas still sit in the active window every turn. They occupy attention budget and shape tool selection even when they’re cached.

A 30K-token MCP schema set is still 30K tokens of context the model is reasoning over, cached or not.