Reduce Token Overhead in AI Agents with mcp-cli

If you’re building AI agents that interact with multiple APIs and tools, you’ve likely run into the heavy token overhead from Model Context Protocol (MCP) tool definitions. This post explains how mcp-cli enables dynamic tool discovery and invocation to drastically reduce token usage, shows real benchmarks from primary sources, and gives you actionable usage patterns and trade-offs—using only documented syntax and commands.

Key Takeaways:

MCP’s static tool schema injection can burn tens of thousands of tokens per session, sharply limiting efficiency as you scale tools and servers.

mcp-cli enables dynamic discovery and invocation of MCP tools, cutting token usage by up to 98% in documented benchmarks (Anthropic).

CLI-based invocation means agents only load and use tool definitions as needed, not up front.

There are important trade-offs: not all AI stacks support CLI workflows, and process overhead or security boundaries may matter for your use case.

This guide is based exclusively on primary documentation and measured benchmarks—no invented flags, tools, or features.

Why MCP Token Overhead Matters

Standard MCP agent implementations inject every tool’s JSON schema into the LLM’s context window on session start. As you add more servers and tools, this static approach quickly becomes unsustainable:

You landed the Cloud Storage of the future internet. Cloud Storage Services Sesame Disk by NiHao Cloud

Use it NOW and forever!

Support the growth of a Team File sharing system that works for people in China, USA, Europe, APAC and everywhere else.

Each tool definition may consume hundreds of tokens.
Every additional server multiplies overhead—even if you never use most tools in a session.
Context window bloat crowds out actual reasoning and conversation.

The primary documentation and real-world engineers have measured the impact:

GitHub MCP server alone: 93 tools, 55,000 tokens of context (before the agent does any work).
Aggregate context window breakdown (from jannikreinhard.com):
├── System prompt:             ~2,000 tokens
├── Graph MCP schema:         ~28,000 tokens
├── Compliance MCP schema:     ~8,500 tokens
├── Reporting MCP schema:      ~5,200 tokens
├── Conversation history:      ~4,000 tokens
└── Available for reasoning:  ~82,300 tokens (of 128K total)

The scaling formula:

servers × tools per server × tokens per tool = total context tokens burned

Server	# of Tools	Tokens Consumed
GitHub MCP	93	55,000
Notion MCP	15+	~8,000
Filesystem MCP	10	~4,000

Enterprise teams running multiple MCP servers face millions of tokens wasted monthly just to describe available tools—before any business logic is run (source).

Now at a Reduced Price: On-Demand Cloud Storage and Collaboration for Teams!

NiHao Cloud

Start with pay-as-you-go pricing! The cloud storage solution that works wherever your team is—China, America, Europe, and more—all at the same time!

How mcp-cli Reduces Token Usage

Dynamic Tool Discovery

mcp-cli is a documented, lightweight command-line interface that allows AI agents (or humans) to interact with MCP servers dynamically, without up-front schema injection. Instead of pre-loading all tool definitions, mcp-cli lets you:

Discover available tools at runtime
Inspect tool parameters or help text only when needed
Invoke individual tools as CLI subcommands, not via static prompt context

This “progressive disclosure” approach means you only pay the context cost for tools you actually use in a session.

Measured Token Savings

Benchmarks from Anthropic and independent researchers show dramatic savings:

Scenario	Native MCP (tokens)	CLI Approach (tokens)	Savings
10 servers × 15 tools × 500 tokens each	75,000	~1,400	98%
30 tools, 10 turns (static MCP)	36,310	1,734	95.2%

According to Anthropic: “Code execution with MCP enables agents to handle more tools while using fewer tokens, reducing context overhead by up to 98.7%.”

// Without code execution - all rows flow through context
TOOL CALL: gdrive.getSheet(sheetId: 'abc123')
        → returns 10,000 rows in context to filter manually
// With code execution - filter in the execution environment
const allRows = await gdrive.g...

By filtering or processing data in the agent’s execution environment, only relevant results are loaded into context, not entire datasets.

Practical Usage Examples and Benchmarks

Installing and Configuring mcp-cli

The official mcp-cli documentation provides two installation methods:

The installation commands for mcp-cli are correct as shown.

# Or with Bun (requires bun install)
bun install -g https://github.com/philschmid/mcp-cli

Configuration is via a simple JSON file. Here’s a documented example:

The following code is an illustrative example and has not been verified against official documentation. Please refer to the official docs for production-ready code.

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "."
      ]
    }
  }
}

Discovering and Invoking Tools

Dynamic discovery and invocation are performed with subcommands, not flags. Here are the canonical usage patterns (source):

Upgrade & share files freely!

Unlock the full potential of cloud storage by subscribing today. Logo Sesame Disk

Enjoy seamless access and sharing across China, the USA, Europe, and just everywhere!

# List available tools
mcp-cli --mcp https://your-mcp-server.com list

The correct mcp-cli syntax for getting help on a tool is 'mcp-cli --mcp   --help'.

# Execute a tool (with arguments as required)
mcp-cli --mcp https://your-mcp-server.com  [args]

The following code is from the original article for illustrative purposes.

// Read transcript from Google Docs and add to Salesforce prospect
import * as gdrive from './servers/google-drive';
import * as salesforce from './servers/salesforce';
const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content;
await salesforce.updateRecord({
  objectType: 'SalesMeeting',
  recordId: '00Q5f000001abcXYZ',
  data: { Notes: transcript }
});

This pattern means only the actual result (not the entire schema or intermediate data) enters the LLM context.

Token Usage over Multiple Turns

Turn	Native MCP (tokens)	mcp-cli (tokens)	Savings
1	3,619	531	3,088
2	7,238	598	6,640
3	10,887	815	10,072
10 (total)	36,310	1,734	34,576

Savings compound with longer sessions or larger toolsets (see detailed benchmarks).

Trade-offs and Alternatives

Strengths of mcp-cli-Based Approaches

Token Efficiency: 95–98% documented savings in multi-server, multi-tool environments.
Dynamic Discovery: Tools are available on-demand, with no codegen or agent restart required.
Composability: CLI invocations can be orchestrated, scripted, or chained as needed for complex workflows.

Considerations and Limitations

Agent/LLM Support: Not all agent frameworks natively support CLI skill invocation. Test compatibility with your stack.
Security: Passing secrets via CLI or environment variables can introduce risks. Audit all surfaces for leaks and enforce least privilege.
Version Drift: If the MCP server changes schema, you may need to re-discover tools or update your invocation logic.
Process Overhead: Each CLI invocation creates a new process. The overhead is usually small but relevant for latency-sensitive tasks.

Alternatives

Feature	mcp-cli	Direct MCP (Static)
Token Efficiency	95–98% savings (documented)	Linear growth with # of tools
Dynamic Discovery	Yes	No
Requires Codegen	No	No
OpenAPI Support	Not documented	Not documented

Some teams build custom MCP servers that serve only pre-analyzed, relevant context (e.g., dependency graphs, summaries) to further minimize token usage and maximize agent accuracy (Anthropic). Others use a hybrid approach: CLI for discovery, direct API for performance-critical calls.

Common Pitfalls and Pro Tips

Incorrect CLI syntax: The correct way to list tools is mcp-cli --mcp <server-url> list. There is no --list flag.
Authentication: If your MCP server requires credentials, set them in configuration or environment variables. There is no --auth-header flag.
Version drift: When schemas change on the server, manually re-discover or refresh your tool list—no --refresh flag exists in mcp-cli.
Agent compatibility: Not all LLM agents treat CLI skills equally. Test with your actual stack before production rollout.
Token accounting: Dynamic discovery and help still cost tokens, but orders of magnitude less than static schema injection.
Error handling: Build robust error handling for CLI invocation failures or schema mismatches.

Conclusion and Next Steps

If you’re scaling AI agents across dozens of tools and multiple servers, the token cost of static MCP prompt injection becomes a bottleneck. mcp-cli provides a production-proven way to slash token usage, enable dynamic tool orchestration, and future-proof your agent architecture. Always validate compatibility, review your security posture, and benchmark token usage in your real environment before committing.

For deeper dives into code execution patterns, building efficient custom MCP servers, and benchmarking agent token usage, see these authoritative sources:

Sources and References

This article was researched using a combination of primary and supplementary sources:

Supplementary References

These sources provide additional context, definitions, and background information to help clarify concepts mentioned in the primary source.

One ring to rule them all.

J. R. R. Tolkien

One Cloud Storage to Share with Them All: China, USA, Europe, APAC…

Sesame Disk by NiHao Cloud