Building Custom Coding Agents with MCP Servers

Introduction to Custom Coding Agents

A single design decision separates teams building useful AI coding agents from those stuck with demo-grade tools: whether the agent can actually use developer tools or just generate text.

In previous analysis of AI code review agents, the biggest gap was clear. Accuracy ranged from 6 percent to over 80 percent depending on how much real context and tooling the system had. Pure prompt-based systems plateau quickly, while tool-integrated agents keep improving.

How MCP Servers Connect Tools

MCP is a protocol that standardizes how agents talk to developer tools. Instead of hardcoding integrations, you expose tools through a server, and the agent discovers and calls them dynamically.

Microsoft and others are already bringing MCP-based infrastructure into developer workflows. For example, the VS Code AI Toolkit can scaffold MCP servers with built-in transport and tool registration logic, making it easier to plug agents into real workflows (source).

At a high level, the architecture includes these components:

Agent: decides what to do next
MCP server: routes tool calls
Tools: execute real actions (tests, lint, queries)
Repository: source of truth for code

Example: An agent tasked with fixing a bug could use the MCP server to call a registered test runner tool, get structured test output, and decide how to adjust the code.

Why this matters in practice

Without MCP:

The agent guesses test outcomes
Cannot verify fixes
Produces inconsistent results

With MCP:

Runs actual tests
Gets structured output
Iterates based on real feedback

This directly addresses the “false positive” problem highlighted in automated code review systems. When the agent is grounded in real execution, it produces less noise.

Building Working Coding Agent

Let’s build a minimal but realistic agent loop.

This version assumes:

A repository on disk
A test runner (pytest)
A simple MCP-style tool interface

Start with the core loop.

import subprocess
from anthropic import Anthropic

client = Anthropic(api_key="your-api-key")

def read_file(path):
 with open(path, "r") as f:
 return f.read()

def run_tests():
 result = subprocess.run(
 ["pytest", "-q"],
 capture_output=True,
 text=True
 )
 return result.stdout

def ask_llm(prompt):
 response = client.messages.create(
 model="claude-2",
 max_tokens=500,
 messages=[{"role": "user", "content": prompt}]
 )
 return response.content[0].text

def agent_loop():
 code = read_file("app/math_utils.py")
 test_output = run_tests()

 prompt = f"""
You are coding agent.

Code:
{code}

Test results:
{test_output}

1. Identify failing issue
2. Suggest fix
3. Output corrected code
"""

 suggestion = ask_llm(prompt)
 print(suggestion)

if __name__ == "__main__":
 agent_loop()

# Expected output (example):
# "The failure is due to incorrect subtraction logic. Replace fn with ..."
# Note: prod use should add sandboxing and timeout handling

This is the simplest usable agent:

Reads real code
Executes tests
Feeds results into the model
Gets a fix

Example in action: If a developer introduces a bug in math_utils.py, this agent will run the tests, see which ones fail, and ask the LLM to suggest corrections based on the actual test output.

But it is still tightly coupled. The next step is separating tools via MCP.

Python and TypeScript impls

To make the agent more flexible, you can abstract tool integrations using MCP concepts.

Python: MCP-style tool abstraction

Instead of calling functions directly, define tools with structured inputs and outputs.

class MCPTool:
 def __init__(self, name, func):
 self.name = name
 self.func = func

 def call(self, args):
 return self.func(**args)

def run_tests_tool():
 result = subprocess.run(
 ["pytest", "-q"],
 capture_output=True,
 text=True
 )
 return {"output": result.stdout}

tools = {
 "run_tests": MCPTool("run_tests", lambda: run_tests_tool())
}

def call_tool(tool_name):
 return tools[tool_name].call({})

# Agent using tool
test_result = call_tool("run_tests")
print(test_result["output"])

# Output:
# "1 failed, 3 passed"
# Note: prod systems should validate tool inputs and restrict execution scope

This is the core MCP idea:

Tools are registered in a registry
The agent calls them by name
Responses are structured, making it easier to interpret results programmatically

In production MCP servers, tool invocation happens over HTTP or RPC rather than direct function calls, allowing for language-agnostic and distributed setups.

TypeScript: same pattern with Anthropic SDK

import { Anthropic } from "anthropic";
import { execSync } from "child_process";

const client = new Anthropic({
 apiKey: process.env.ANTHROPIC_API_KEY,
});

function runTests(): string {
 return execSync("pytest -q").toString();
}

async function agent() {
 const testOutput = runTests();

 const response = await client.messages.create({
 model: "claude-2",
 max_tokens: 500,
 messages: [
 {
 role: "user",
 content: `Test results:\n${testOutput}\nSuggest fix.`,
 },
 ],
 });

 console.log(response.content[0].text);
}

agent();

The Python and TypeScript versions follow the same pattern:

Execute a tool
Feed the results to the agent
Generate a fix based on output

The difference in production systems is orchestration.

Example: By abstracting tool calls, you can swap out the test runner or add new capabilities without rewriting core agent logic.

Where MCP changes design

Instead of embedding tools locally:

Tools live behind an MCP server
The agent discovers available tools dynamically
Multiple agents can share the same tools

This becomes important once you scale beyond a single repository or need to support multiple agents collaborating on larger codebases.

prod Considerations and Tradeoffs

This is where most teams fail.

Building a working agent is easy. Making it reliable is more challenging.

Transition: As agents move from prototypes to production, new concerns such as security, tool quality, and scalability become critical. Below are the main considerations teams encounter.

1. Security risks are real

MCP servers expose powerful capabilities:

Running code
Accessing repositories
Querying internal systems

Poor configuration can lead to remote code execution risks, which have already been observed in MCP-style systems (source).

Practical safeguards include:

Running tools in sandboxes
Restricting file system access
Enforcing allowlists for commands
Logging every tool invocation

Example: Running the test runner inside a Docker container with read-only filesystem access limits the risk if the tool is compromised.

2. Tool quality determines agent quality

From earlier code review analysis:

High false positives kill adoption
Context matters more than model size

The same applies here:

Bad tool integration:

Noisy suggestions
Broken fixes
Developer distrust

Good tool integration:

Deterministic signals (tests, lint)
Context-aware fixes
Lower hallucination rate

This matches practices discussed in prompt engineering for code generation, where constraints and validation dramatically improve output quality.

Example: Integrating a static type checker as a tool can help catch errors before they reach production, increasing trust in the agent’s suggestions.

3. MCP vs direct integration

Approach	How it works	Pros	Cons
Direct tool calls	Agent calls local functions	Simple, fast to build	Tightly coupled, hard to scale
MCP server	Tools exposed via protocol	Reusable, multi-agent support	More infrastructure, security overhead
Cloud-hosted MCP	Managed server (e.g. Foundry MCP)	Scales easily across teams	Less control, requires governance

Cloud-hosted implementations are emerging. Microsoft’s Foundry MCP server provides a hosted environment for connecting agents to tools and data sources (source).

Example: A team can deploy an MCP server once, register all their tools, and allow multiple agents or services to use them for different automation tasks.

4. Multi-agent workflows are next step

Single agents break down on complex tasks.

Production systems are moving toward:

Code reader agent
Test executor agent
Fix generator agent

Each specializes and communicates through shared context.

This is part of the shift from simple retrieval-augmented generation (RAG) systems toward tool-driven, multi-agent architectures seen in enterprise MCP deployments.

Example: In a large refactoring, one agent can analyze code dependencies, another runs tests, and a third proposes fixes, coordinating through the MCP layer.

5. Observability and debugging

Agents fail in non-obvious ways.

You need:

Tool call logs
Input/output tracing
Replay capability

Otherwise debugging becomes guesswork.

Example: If an agent’s code suggestion introduces new errors, being able to trace which tool calls and inputs led to the suggestion is essential for effective troubleshooting.

Conclusion and Next Steps

Custom coding agents are moving from novelty to infrastructure. The difference between toy and production systems comes down to one factor: tool use.

MCP provides the missing layer:

Standardized tool access
Decoupled architecture
Scalable agent workflows

The practical path forward:

Start with a local agent loop (like above)
Add structured tools
Move tools behind an MCP server
Introduce sandboxing and logging
Expand into multi-agent workflows

Teams that treat these systems like production infrastructure, not demos, will see gains observed in code review automation: higher signal, fewer false positives, and real productivity improvements.

Key Takeaways:

The image shows a close-up view of organized server or data center infrastructure, with numerous interconnected blue cables and white connectors attached to a structured panel. It highlights networking or electrical components, making it suitable for articles related to technology, data management, or network infrastructure.

Tool use is the main factor that separates useful coding agents from prompt-only systems.
MCP standardizes how agents connect to tools like test runners and linters.
A working agent must execute real code, not just generate suggestions.
Security and sandboxing are critical when exposing tools via MCP servers.
Multi-agent architectures are becoming the default for complex coding workflows.