AGENTS.md – Sesame Disk Group

Repository-level context files like AGENTS.md are widely recommended for coding agents, but until recently, nobody had rigorously tested if they actually help real-world coding automation. Are they a productivity booster or an overlooked source of confusion and cost? This post breaks down the findings from the first major study on AGENTS.md, shows what works (and what doesn't), and gives you practical guidance for using context files with coding agents.

Key Takeaways:

AGENTS.md files are popular, but research shows they can reduce agent task success rates and increase inference costs

Both LLM-generated and hand-written AGENTS.md files cause agents to explore more, but not always productively

Minimal, essential context is better—avoid overloading agents with non-critical instructions

Human-written AGENTS.md files should focus on unique, actionable constraints only

Testing with and without AGENTS.md in your own workflow is the only way to know if it helps

Why AGENTS.md Exists: Intended Purpose and Usage

The main goal of AGENTS.md is to give AI-powered coding agents repository-specific instructions, so their automated edits, bug fixes, or feature additions are consistent with project standards and constraints. This is intended to:

Reduce the risk of agents making invalid or inappropriate changes
Clarify non-obvious requirements (e.g., “Do not edit files in /vendor” or “All tests must use Mocha”)
Guide the agent’s “reasoning” about how to approach changes in a complex codebase

Here’s a realistic AGENTS.md example:

# AGENTS.md
## Project Rules
- Use the `pytest` test framework for all new tests.
- Follow PEP8 code style.
- Do not modify anything in `/legacy` or `/external`.
- All new endpoints need to be documented in `api_docs/`.
## Deployment
- Production deployments use Docker Compose with `docker-compose.prod.yml`.
- Never remove `.env` files from the repo.

This file is usually added to the repo root and then referenced by coding agents during task execution. Many agent frameworks (such as OpenAI’s Code Interpreter and various open-source agent runners) now support providing a context file as part of their prompt structure. Some teams generate these files automatically, while others write them by hand when onboarding agents.

However, as noted in the first major study on the topic, “Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks.”

However, as noted in the first major study on the topic, “Although this practice is strongly encouraged by agent developers, there is little rigorous evidence on whether such context files improve real-world task performance.”

Let’s look at what the data actually says.

What the Research Shows: Task Success and Costs

The study, Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?, is the first to test AGENTS.md’s effectiveness at scale. Here’s how the researchers set up their experiment:

They used two types of testbeds:
- Standardized software engineering (SWE-bench) tasks from public GitHub repositories, with AGENTS.md files generated by LLMs following agent developer best practices
- Real-world issues from repositories with actual, developer-committed AGENTS.md files
Multiple coding agents and LLMs were tested to check if effects were consistent
Metrics recorded included:
- Task success rate (did the agent solve the issue?)
- Inference cost (how many tokens/calls were consumed?)
- Agent behavior (how did agents approach the task?)

The results were unexpected for many in the field.

Metric	With AGENTS.md	No AGENTS.md	Observed Impact
Task Success Rate	Lower	Higher	Context files decreased agent success
Inference Cost	20%+ higher	Lower baseline	Context files increased cost
Exploration/Testing	Broader, more thorough	More targeted	Agents follow AGENTS.md, sometimes excessively

As summarized in the paper: “Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%. Both LLM-generated and developer-provided context files encourage broader exploration (e.g., more thorough testing and file traversal), and coding agents tend to respect their instructions.” (source)

In practice, this means that while agents spend more time “exploring” and “testing” (which sounds good), they can get distracted or bogged down by unnecessary instructions, leading to lower task completion and higher token usage.

Community reactions have been mixed. Some, as seen on Hacker News, argue that even a small improvement in edge cases might justify AGENTS.md, but the research finds that—on average—too much context is a net negative.

For a broader look at how markdown files and docs influence automation, see Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?.

Agent Behavior and Context Files

The research also analyzed how AGENTS.md changes agent behavior—not just outcomes. Here’s what they found:

Agents are highly obedient to AGENTS.md, even when it doesn’t help solve the core task
Larger, more detailed context files lead to agents performing additional file traversals, running more tests, and sometimes getting sidetracked by requirements not relevant to the immediate task
In several cases, agents spent effort on constraints or instructions from AGENTS.md at the expense of actually fixing the bug or implementing the requested feature

For example, if AGENTS.md emphasizes exhaustive testing, agents may add or rewrite more tests even when the primary job is a quick bugfix—leading to unnecessary code churn and increased review time.

Below is a realistic (if problematic) AGENTS.md file that can cause issues:

# AGENTS.md
## Rules
- All functions must have docstrings (even private/internal).
- All code changes must include new integration and unit tests.
- Update all dependent documentation on every code change.
- Review all code in `/experimental`, even for unrelated changes.

In the study, agents that followed such files often failed to complete the actual requested task because they spent too much time on the “explore everything” instructions.

Notably, the researchers concluded: “Unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.” (SRI Lab summary)

This is a crucial shift from “more context is better” to “focused, minimal context is best.”

For another angle on optimizing API design and agent behavior, see A Better Streams API Is Possible for JavaScript.

Practical Takeaways for Coding Agents

If you’re using or planning to use AGENTS.md in your workflow, here’s how to maximize benefits and avoid documented downsides:

1. Limit AGENTS.md Content to Essentials

Include only what an agent absolutely must know to avoid breaking the repo or violating business rules
Keep instructions actionable and short. Remove generic best practices and restatements of conventions already enforced by CI or linting tools

Minimalist template:

# AGENTS.md
## Essential Rules
- Only modify code in `/src` and `/tests`.
- Do not touch `/vendor` or `/legacy`.
- New features require a unit test.

2. Test With and Without AGENTS.md

Do not assume AGENTS.md will help just because your agent supports it. Run comparative tests to measure actual impact.

# Python pseudocode for local testing
for context_file in [None, "./AGENTS.md"]:
    results = run_agent_on_issue_set(context_file=context_file)
    print("Success rate:", results.success_rate)
    print("Avg. inference cost:", results.inference_tokens)

Keep logs of success/failure and cost for both configurations
Iterate on AGENTS.md content based on empirical results, not assumptions

3. Use AGENTS.md for Non-Obvious Constraints, Not Style

Do: “Do not modify cryptographic code in `/security`.”
Do: “All customer-facing endpoints must log to `/logs/api.log`.”
Don’t: “Use Black for formatting” (already handled by tools/linting)

4. Audit and Update Regularly

AGENTS.md should evolve as your codebase and automation change
Remove rules that are no longer relevant, and test agent behavior after each update

5. Watch Inference Costs

If you’re paying per token or per API call, remember inference cost increased by over 20% in the study when AGENTS.md was present
For large repositories or frequent agent runs, this can become a significant expense

This approach echoes best practices in high-performance development—see High-Performance Julia Coding: Strategies for Optimizing Your for similar philosophies applied to Julia.

6. Consider Team Training and Documentation

Explain to human contributors why AGENTS.md exists, what should (and shouldn’t) go in it, and how it’s used by automation
Use code review to enforce brevity and relevance in AGENTS.md updates

In summary: treat AGENTS.md as a surgical tool, not a dumping ground for every project rule or preference.

Common Pitfalls and Pro Tips

Pitfall: Overly Verbose Files

Excessive lists of goals, code standards, or duplicating README content lead to agents wasting cycles and missing the point
Leads to “analysis paralysis” where agents spend time on irrelevant exploration instead of the task at hand

Pitfall: Duplicated or Conflicting Information

Copy-pasting from CONTRIBUTING.md or code comments can confuse agents, especially if instructions aren’t aligned
Conflicting or ambiguous rules can cause erratic automated behavior

Pitfall: “More Is Better” Fallacy

The study proves that more context is not always better; in fact, it’s usually worse for agent-driven task completion
Each instruction should earn its place—if it’s not actionable and critical, leave it out

Pro Tip: Use AGENTS.md for Repo-Specific, High-Impact Rules

Focus on what’s unique to your project or what’s likely to trip up a coding agent
For example, “Do not commit secrets. If a secret is found, halt execution and notify [email protected].”

Pro Tip: Monitor and Limit File Size

Large AGENTS.md files increase prompt size and cost—keep the file as short as possible
Automate checks for AGENTS.md length if using CI/CD pipelines

Pro Tip: Use AGENTS.md as a Last Line of Defense, Not a Primary Source of Truth

Don’t assume AGENTS.md replaces enforcement via code, tests, or CI tools
It’s a supplement for automation, not a replacement for proper engineering workflow

For more on balancing automation and human control, see A Better Streams API Is Possible for JavaScript.

Conclusion

The AGENTS.md pattern is popular, but rigorous research shows it can hurt task success and drive up costs if misused. The main lesson: less is more. Use AGENTS.md sparingly, focusing on unique, actionable rules, and always test whether it actually improves your agent workflows. For detailed findings, review the official paper at arxiv.org and the summary at SRI Lab. And if you’re interested in how documentation standards influence developer and agent productivity, see Understanding Windows 11 Notepad: Markdown Support and CVE-2026-20841.

Bottom line: don’t blindly trust best practices—measure, iterate, and keep AGENTS.md laser-focused on what matters most for your coding agents and your team.