If you’re building AI agents that interact with multiple APIs and tools, you’ve likely run into the heavy token overhead from Model Context Protocol (MCP) tool definitions. This post explains how mcp-cli enables dynamic tool discovery and invocation to drastically reduce token usage, shows real benchmarks from primary sources, and gives you actionable usage patterns and trade-offs—using only documented syntax and commands.
Key Takeaways:
- MCP’s static tool schema injection can burn tens of thousands of tokens per session, sharply limiting efficiency as you scale tools and servers.
- mcp-cli enables dynamic discovery and invocation of MCP tools, cutting token usage by up to 98% in documented benchmarks (Anthropic).
- CLI-based invocation means agents only load and use tool definitions as needed, not up front.
- There are important trade-offs: not all AI stacks support CLI workflows, and process overhead or security boundaries may matter for your use case.
- This guide is based exclusively on primary documentation and measured benchmarks—no invented flags, tools, or features.
Why MCP Token Overhead Matters
Standard MCP agent implementations inject every tool’s JSON schema into the LLM’s context window on session start. As you add more servers and tools, this static approach quickly becomes unsustainable:
- Each tool definition may consume hundreds of tokens.
- Every additional server multiplies overhead—even if you never use most tools in a session.
- Context window bloat crowds out actual reasoning and conversation.
The primary documentation and real-world engineers have measured the impact:
GitHub MCP server alone: 93 tools, 55,000 tokens of context (before the agent does any work).
Aggregate context window breakdown (from jannikreinhard.com):
├── System prompt: ~2,000 tokens
├── Graph MCP schema: ~28,000 tokens
├── Compliance MCP schema: ~8,500 tokens
├── Reporting MCP schema: ~5,200 tokens
├── Conversation history: ~4,000 tokens
└── Available for reasoning: ~82,300 tokens (of 128K total)
The scaling formula:
servers × tools per server × tokens per tool = total context tokens burned
| Server | # of Tools | Tokens Consumed |
|---|---|---|
| GitHub MCP | 93 | 55,000 |
| Notion MCP | 15+ | ~8,000 |
| Filesystem MCP | 10 | ~4,000 |
Enterprise teams running multiple MCP servers face millions of tokens wasted monthly just to describe available tools—before any business logic is run (source).
How mcp-cli Reduces Token Usage
Dynamic Tool Discovery
mcp-cli is a documented, lightweight command-line interface that allows AI agents (or humans) to interact with MCP servers dynamically, without up-front schema injection. Instead of pre-loading all tool definitions, mcp-cli lets you:
- Discover available tools at runtime
- Inspect tool parameters or help text only when needed
- Invoke individual tools as CLI subcommands, not via static prompt context
This “progressive disclosure” approach means you only pay the context cost for tools you actually use in a session.
Measured Token Savings
Benchmarks from Anthropic and independent researchers show dramatic savings:
| Scenario | Native MCP (tokens) | CLI Approach (tokens) | Savings |
|---|---|---|---|
| 10 servers × 15 tools × 500 tokens each | 75,000 | ~1,400 | 98% |
| 30 tools, 10 turns (static MCP) | 36,310 | 1,734 | 95.2% |
According to Anthropic: “Code execution with MCP enables agents to handle more tools while using fewer tokens, reducing context overhead by up to 98.7%.”
// Without code execution - all rows flow through context
TOOL CALL: gdrive.getSheet(sheetId: 'abc123')
→ returns 10,000 rows in context to filter manually
// With code execution - filter in the execution environment
const allRows = await gdrive.g...
By filtering or processing data in the agent’s execution environment, only relevant results are loaded into context, not entire datasets.
Practical Usage Examples and Benchmarks
Installing and Configuring mcp-cli
The official mcp-cli documentation provides two installation methods:
The installation commands for mcp-cli are correct as shown.
# Or with Bun (requires bun install)
bun install -g https://github.com/philschmid/mcp-cli
Configuration is via a simple JSON file. Here’s a documented example:
The following code is an illustrative example and has not been verified against official documentation. Please refer to the official docs for production-ready code.
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"."
]
}
}
}
Discovering and Invoking Tools
Dynamic discovery and invocation are performed with subcommands, not flags. Here are the canonical usage patterns (source):
# List available tools
mcp-cli --mcp https://your-mcp-server.com list
The correct mcp-cli syntax for getting help on a tool is 'mcp-cli --mcp --help'.
# Execute a tool (with arguments as required)
mcp-cli --mcp https://your-mcp-server.com [args]
The following code is from the original article for illustrative purposes.
// Read transcript from Google Docs and add to Salesforce prospect
import * as gdrive from './servers/google-drive';
import * as salesforce from './servers/salesforce';
const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content;
await salesforce.updateRecord({
objectType: 'SalesMeeting',
recordId: '00Q5f000001abcXYZ',
data: { Notes: transcript }
});
This pattern means only the actual result (not the entire schema or intermediate data) enters the LLM context.
Token Usage over Multiple Turns
| Turn | Native MCP (tokens) | mcp-cli (tokens) | Savings |
|---|---|---|---|
| 1 | 3,619 | 531 | 3,088 |
| 2 | 7,238 | 598 | 6,640 |
| 3 | 10,887 | 815 | 10,072 |
| 10 (total) | 36,310 | 1,734 | 34,576 |
Savings compound with longer sessions or larger toolsets (see detailed benchmarks).
Trade-offs and Alternatives
Strengths of mcp-cli-Based Approaches
- Token Efficiency: 95–98% documented savings in multi-server, multi-tool environments.
- Dynamic Discovery: Tools are available on-demand, with no codegen or agent restart required.
- Composability: CLI invocations can be orchestrated, scripted, or chained as needed for complex workflows.
Considerations and Limitations
- Agent/LLM Support: Not all agent frameworks natively support CLI skill invocation. Test compatibility with your stack.
- Security: Passing secrets via CLI or environment variables can introduce risks. Audit all surfaces for leaks and enforce least privilege.
- Version Drift: If the MCP server changes schema, you may need to re-discover tools or update your invocation logic.
- Process Overhead: Each CLI invocation creates a new process. The overhead is usually small but relevant for latency-sensitive tasks.
Alternatives
| Feature | mcp-cli | Direct MCP (Static) |
|---|---|---|
| Token Efficiency | 95–98% savings (documented) | Linear growth with # of tools |
| Dynamic Discovery | Yes | No |
| Requires Codegen | No | No |
| OpenAPI Support | Not documented | Not documented |
Some teams build custom MCP servers that serve only pre-analyzed, relevant context (e.g., dependency graphs, summaries) to further minimize token usage and maximize agent accuracy (Anthropic). Others use a hybrid approach: CLI for discovery, direct API for performance-critical calls.
Common Pitfalls and Pro Tips
- Incorrect CLI syntax: The correct way to list tools is
mcp-cli --mcp <server-url> list. There is no--listflag. - Authentication: If your MCP server requires credentials, set them in configuration or environment variables. There is no
--auth-headerflag. - Version drift: When schemas change on the server, manually re-discover or refresh your tool list—no
--refreshflag exists in mcp-cli. - Agent compatibility: Not all LLM agents treat CLI skills equally. Test with your actual stack before production rollout.
- Token accounting: Dynamic discovery and help still cost tokens, but orders of magnitude less than static schema injection.
- Error handling: Build robust error handling for CLI invocation failures or schema mismatches.
Conclusion and Next Steps
If you’re scaling AI agents across dozens of tools and multiple servers, the token cost of static MCP prompt injection becomes a bottleneck. mcp-cli provides a production-proven way to slash token usage, enable dynamic tool orchestration, and future-proof your agent architecture. Always validate compatibility, review your security posture, and benchmark token usage in your real environment before committing.
For deeper dives into code execution patterns, building efficient custom MCP servers, and benchmarking agent token usage, see these authoritative sources:
- Code execution with MCP: building more efficient AI agents
- MCP vs mcp-cli: Dynamic Tool Discovery for Token-Efficient AI Agents
- Introducing MCP CLI: A way to call MCP Servers Efficiently
Sources and References
This article was researched using a combination of primary and supplementary sources:
Supplementary References
These sources provide additional context, definitions, and background information to help clarify concepts mentioned in the primary source.
- Code execution with MCP: building more efficient AI agents \ Anthropic
- Reducing MCP token usage by 100x — you don't need code mode | Speakeasy
- MCP Context Mode: Cut Claude Code Token Burn 98% for AI API Devs (2026) | AI Blog API for Developers
- Token‑Efficient Agents: Building MCP‑Heavy Agents Without Burning Tokens




