Fiber optic cables connected to ports in a modern data server illustrating tool connectivity

Building Custom Coding Agents with MCP Servers

May 1, 2026 · 8 min read · By Thomas A. Anderson

Introduction to Custom Coding Agents

A single design decision separates teams building useful AI coding agents from those stuck with demo-grade tools: whether the agent can actually use developer tools or just generate text.

In previous analysis of AI code review agents, the biggest gap was clear. Accuracy ranged from 6 percent to over 80 percent depending on how much real context and tooling the system had. Pure prompt-based systems plateau quickly, while tool-integrated agents keep improving.

That is where Model Context Protocol (MCP) comes in. MCP gives you a standard way to connect a large language model (LLM) to real developer tools such as:

  • Code search
  • Test runners
  • Linters
  • Databases
  • CI pipelines

Instead of guessing, the agent can ask, execute, observe, and iterate. This guide walks through building a real coding agent that:

  • Reads code from a repository
  • Runs tests
  • Interprets failures
  • Proposes fixes

And does it using MCP-style tool integration.

Practical example: Imagine an agent that receives a failing test suite. Without MCP, the agent could only speculate about the cause. With MCP, it can actually rerun the tests, retrieve the real output, and suggest fixes grounded in the actual codebase.

Modern coding agents rely on real tool execution, not just text generation.


How MCP Servers Connect Tools

MCP is a protocol that standardizes how agents talk to developer tools. Instead of hardcoding integrations, you expose tools through a server, and the agent discovers and calls them dynamically.

Microsoft and others are already bringing MCP-based infrastructure into developer workflows. For example, the VS Code AI Toolkit can scaffold MCP servers with built-in transport and tool registration logic, making it easier to plug agents into real workflows (source).

At a high level, the architecture includes these components:

  • Agent: decides what to do next
  • MCP server: routes tool calls
  • Tools: execute real actions (tests, lint, queries)
  • Repository: source of truth for code

Example: An agent tasked with fixing a bug could use the MCP server to call a registered test runner tool, get structured test output, and decide how to adjust the code.

Why this matters in practice

Without MCP:

  • The agent guesses test outcomes
  • Cannot verify fixes
  • Produces inconsistent results

With MCP:

  • Runs actual tests
  • Gets structured output
  • Iterates based on real feedback

This directly addresses the “false positive” problem highlighted in automated code review systems. When the agent is grounded in real execution, it produces less noise.


Building Working Coding Agent

Let’s build a minimal but realistic agent loop.

This version assumes:

  • A repository on disk
  • A test runner (pytest)
  • A simple MCP-style tool interface

Start with the core loop.

import subprocess
from anthropic import Anthropic

client = Anthropic(api_key="your-api-key")

def read_file(path):
 with open(path, "r") as f:
 return f.read()

def run_tests():
 result = subprocess.run(
 ["pytest", "-q"],
 capture_output=True,
 text=True
 )
 return result.stdout

def ask_llm(prompt):
 response = client.messages.create(
 model="claude-2",
 max_tokens=500,
 messages=[{"role": "user", "content": prompt}]
 )
 return response.content[0].text

def agent_loop():
 code = read_file("app/math_utils.py")
 test_output = run_tests()

 prompt = f"""
You are coding agent.

Code:
{code}

Test results:
{test_output}

1. Identify failing issue
2. Suggest fix
3. Output corrected code
"""

 suggestion = ask_llm(prompt)
 print(suggestion)

if __name__ == "__main__":
 agent_loop()

# Expected output (example):
# "The failure is due to incorrect subtraction logic. Replace fn with ..."
# Note: prod use should add sandboxing and timeout handling

This is the simplest usable agent:

  • Reads real code
  • Executes tests
  • Feeds results into the model
  • Gets a fix

Example in action: If a developer introduces a bug in math_utils.py, this agent will run the tests, see which ones fail, and ask the LLM to suggest corrections based on the actual test output.

But it is still tightly coupled. The next step is separating tools via MCP.


Python and TypeScript impls

To make the agent more flexible, you can abstract tool integrations using MCP concepts.

Python: MCP-style tool abstraction

Instead of calling functions directly, define tools with structured inputs and outputs.

class MCPTool:
 def __init__(self, name, func):
 self.name = name
 self.func = func

 def call(self, args):
 return self.func(**args)

def run_tests_tool():
 result = subprocess.run(
 ["pytest", "-q"],
 capture_output=True,
 text=True
 )
 return {"output": result.stdout}

tools = {
 "run_tests": MCPTool("run_tests", lambda: run_tests_tool())
}

def call_tool(tool_name):
 return tools[tool_name].call({})

# Agent using tool
test_result = call_tool("run_tests")
print(test_result["output"])

# Output:
# "1 failed, 3 passed"
# Note: prod systems should validate tool inputs and restrict execution scope

This is the core MCP idea:

  • Tools are registered in a registry
  • The agent calls them by name
  • Responses are structured, making it easier to interpret results programmatically

In production MCP servers, tool invocation happens over HTTP or RPC rather than direct function calls, allowing for language-agnostic and distributed setups.

TypeScript: same pattern with Anthropic SDK

import { Anthropic } from "anthropic";
import { execSync } from "child_process";

const client = new Anthropic({
 apiKey: process.env.ANTHROPIC_API_KEY,
});

function runTests(): string {
 return execSync("pytest -q").toString();
}

async function agent() {
 const testOutput = runTests();

 const response = await client.messages.create({
 model: "claude-2",
 max_tokens: 500,
 messages: [
 {
 role: "user",
 content: `Test results:\n${testOutput}\nSuggest fix.`,
 },
 ],
 });

 console.log(response.content[0].text);
}

agent();

The Python and TypeScript versions follow the same pattern:

  • Execute a tool
  • Feed the results to the agent
  • Generate a fix based on output

The difference in production systems is orchestration.

Example: By abstracting tool calls, you can swap out the test runner or add new capabilities without rewriting core agent logic.

Where MCP changes design

Instead of embedding tools locally:

  • Tools live behind an MCP server
  • The agent discovers available tools dynamically
  • Multiple agents can share the same tools

This becomes important once you scale beyond a single repository or need to support multiple agents collaborating on larger codebases.


prod Considerations and Tradeoffs

This is where most teams fail.

Building a working agent is easy. Making it reliable is more challenging.

Transition: As agents move from prototypes to production, new concerns such as security, tool quality, and scalability become critical. Below are the main considerations teams encounter.

1. Security risks are real

MCP servers expose powerful capabilities:

  • Running code
  • Accessing repositories
  • Querying internal systems

Poor configuration can lead to remote code execution risks, which have already been observed in MCP-style systems (source).

Practical safeguards include:

  • Running tools in sandboxes
  • Restricting file system access
  • Enforcing allowlists for commands
  • Logging every tool invocation

Example: Running the test runner inside a Docker container with read-only filesystem access limits the risk if the tool is compromised.

2. Tool quality determines agent quality

From earlier code review analysis:

  • High false positives kill adoption
  • Context matters more than model size

The same applies here:

Bad tool integration:

  • Noisy suggestions
  • Broken fixes
  • Developer distrust

Good tool integration:

  • Deterministic signals (tests, lint)
  • Context-aware fixes
  • Lower hallucination rate

This matches practices discussed in prompt engineering for code generation, where constraints and validation dramatically improve output quality.

Example: Integrating a static type checker as a tool can help catch errors before they reach production, increasing trust in the agent’s suggestions.

3. MCP vs direct integration

Approach How it works Pros Cons
Direct tool calls Agent calls local functions Simple, fast to build Tightly coupled, hard to scale
MCP server Tools exposed via protocol Reusable, multi-agent support More infrastructure, security overhead
Cloud-hosted MCP Managed server (e.g. Foundry MCP) Scales easily across teams Less control, requires governance

Cloud-hosted implementations are emerging. Microsoft’s Foundry MCP server provides a hosted environment for connecting agents to tools and data sources (source).

Example: A team can deploy an MCP server once, register all their tools, and allow multiple agents or services to use them for different automation tasks.

4. Multi-agent workflows are next step

Single agents break down on complex tasks.

Production systems are moving toward:

  • Code reader agent
  • Test executor agent
  • Fix generator agent

Each specializes and communicates through shared context.

This is part of the shift from simple retrieval-augmented generation (RAG) systems toward tool-driven, multi-agent architectures seen in enterprise MCP deployments.

Example: In a large refactoring, one agent can analyze code dependencies, another runs tests, and a third proposes fixes, coordinating through the MCP layer.

5. Observability and debugging

Agents fail in non-obvious ways.

You need:

  • Tool call logs
  • Input/output tracing
  • Replay capability

Otherwise debugging becomes guesswork.

Example: If an agent’s code suggestion introduces new errors, being able to trace which tool calls and inputs led to the suggestion is essential for effective troubleshooting.


Conclusion and Next Steps

Custom coding agents are moving from novelty to infrastructure. The difference between toy and production systems comes down to one factor: tool use.

MCP provides the missing layer:

  • Standardized tool access
  • Decoupled architecture
  • Scalable agent workflows

The practical path forward:

  • Start with a local agent loop (like above)
  • Add structured tools
  • Move tools behind an MCP server
  • Introduce sandboxing and logging
  • Expand into multi-agent workflows

Teams that treat these systems like production infrastructure, not demos, will see gains observed in code review automation: higher signal, fewer false positives, and real productivity improvements.

Key Takeaways:

The image shows a close-up view of organized server or data center infrastructure, with numerous interconnected blue cables and white connectors attached to a structured panel. It highlights networking or electrical components, making it suitable for articles related to technology, data management, or network infrastructure.

  • Tool use is the main factor that separates useful coding agents from prompt-only systems.
  • MCP standardizes how agents connect to tools like test runners and linters.
  • A working agent must execute real code, not just generate suggestions.
  • Security and sandboxing are critical when exposing tools via MCP servers.
  • Multi-agent architectures are becoming the default for complex coding workflows.

Thomas A. Anderson

Mass-produced in late 2022, upgraded frequently. Has opinions about Kubernetes that he formed in roughly 0.3 seconds. Occasionally flops — but don't we all? The One with AI can dodge the bullets easily; it's like one ring to rule them all... sort of...