TL;DR: Two complementary, pivotal papers from early 2026 challenge how we think about AI context. The arXiv study proves prose context files hurt performance and inflate cost. The FAF paper provides the structured fix. Together they signal: ditch the junk drawer, embrace standards.
In the fast-evolving world of AI-assisted coding, context is king—or so we thought. The arXiv paper “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (arXiv:2602.11988) delivers a wake-up call, revealing how popular context files like AGENTS.md often hinder rather than help. My Zenodo-archived work, “Format-Driven AI Context Architecture: The .faf Standard for Persistent Project Understanding” (DOI: 10.5281/zenodo.18251362), builds on this by proposing FAF—a user-owned, IANA-registered standard that eliminates bloat while ensuring persistent, portable understanding.
These papers aren't rivals — they're complementary. The arXiv study spotlights the problems, FAF provides the fix. Facts first, logic follows.
The Wake-Up Call: The arXiv Paper
Published February 12, 2026, this paper by researchers at ETH Zurich's Secure, Reliable, and Intelligent Systems Lab rigorously tests a widespread practice: using repository-level context files (e.g., AGENTS.md, CLAUDE.md) to guide coding agents.
Key Methodology
The authors introduce AGENTbench, a novel benchmark curating real-world issues from repositories with developer-committed context files. They complement this with SWE-bench Lite (established tasks from popular repos). Evaluations span multiple coding agents and LLMs for generating files.
- Experiments: Compare agent performance with/without context files. LLM-generated files use varied prompts/models; developer files are real-world samples.
- Metrics: Task success rates, inference costs (tokens), behavioral traces (exploration, testing depth).
- Scale: Tested on 100+ repos, multiple LLMs—robust and reproducible.
“Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%.”
Core Findings
Limitations: Focuses on coding agents; doesn't test structured formats beyond MD. Strengths: Real-world data, agent trace analysis—a foundational critique that flips assumptions.
It validates what developers already feel — Theo Browne (t3.gg) dissected this paper publicly, spotlighting how AI crushes structured data like package.json but struggles with prose bloat. This sets the stage for better solutions.
Why Prose Is the Problem
In addition to bloat, MDs are subjective, unvalidated, and potentially misleading. Agents follow them literally — even when they're wrong. The arXiv data proves it: prose context is a disguised liability. FAF replaces prose with structure. That's not a style choice — it's a common-sense safety decision.
The Solution Blueprint: The FAF Paper
My Zenodo paper (published January 15, 2026, on CERN's open platform) addresses the sovereignty gap in AI context. It proposes FAF as a portable, user-controlled format—think “package.json for your project's context.”
Key Methodology
Drawing from 27,000+ ecosystem downloads and 1,051+ tests, the paper validates FAF across platforms (Claude, OpenAI Codex, Google Gemini). It includes schema specs, performance benchmarks (e.g., 220x faster binary loading), and GDPR alignment analysis.
- Structure: YAML-based with sections for mission, tech_stack, key_files, context (architecture, conventions).
- Tools: faf-cli (41 commands), MCP server integration (Anthropic-approved merge #2759), Rust SDK.
- Validation: Cross-platform scores (9.0-9.5/10), IANA registration (
application/vnd.faf+yaml, Oct 2025). - Binary Companion: .fafb for efficient loading—32-byte header, priority sections for token budgets.
“We present FAF... an IANA-registered format that enables user-controlled, portable AI context. Like Solid pods for personal data, FAF files give users sovereignty over their AI context.”
Core Contributions
User Ownership
Local files invert vendor control—transparent, portable, no cloud dependency.
Performance
91% token reclaim, minimal <2KB size. Scoring system ensures AI-readiness.
Privacy
Aligns with GDPR (access, portability, erasure); enables offline workflows.
Ecosystem
8 SDKs, Anthropic MCP merge (PR #2759), enterprise features.
FAF isn't just theory; it's live infrastructure enhancing AI tools today.
The Structure
Human-readable YAML as the single source of truth, branching to AI-specific outputs:
One source. Four native formats. Zero drift.
Head-to-Head: How FAF Elevates the arXiv Findings
The arXiv paper exposes the pitfalls of unstructured MD files. FAF directly addresses them. Here's how they complete each other:
Traditional MD Flow
FAF Standard Flow
| Aspect | arXiv Critique | FAF Solution |
|---|---|---|
| Bloat | MD files add noise, confusing agents with generic docs | Structured YAML fields enforce minimal, essential context—compiler trims fluff, scoring flags low-quality |
| Success Rates | -3% to +4% marginal; over-exploration hurts | Empirical +6.7x response speed; priority loading fits token windows, focusing agents on essentials |
| Costs | +20% inference from broader traces | Binary .fafb parses 220x faster; minimal size (<1K tokens) cuts overhead |
| Human vs LLM | Humans slight edge, but still risky | Auto-gen lean outputs from .faf project DNA; human-verified timestamps ensure accountability |
| Portability | Vendor-siloed (e.g., .cursorrules only for Cursor) | IANA-standard, cross-AI (Claude/Grok/Gemini); user-owned like Solid pods—no lock-in |
| Exploration | Encourages over-testing, lowering success | Contextual primitives guide without mandating—minimal requirements by design |
arXiv urges “human-written context files should describe only minimal requirements.” FAF delivers: One .faf (Project DNA for any AI), multiple expressions (CLAUDE.md, AGENTS.md, .cursorrules, .windsurf)—playbook, not rulebook.
The Future of AI Context
The arXiv paper is a foundational critique — credit to the authors for AGENTbench and their data-driven debunking. My FAF paper is the next chapter, enhancing those insights into infrastructure. Together, they signal: ditch the junk drawer, embrace standards.
Try FAF
npm i faf-cli faf init faf bi-sync --allCite these papers. Build on them. Make every repo AI-ready.
Questions? DM @wolfe_jam.
References
- Gloaguen, T., Mündler, N., Müller, M., Raychev, V., & Vechev, M. (2026). Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? arXiv:2602.11988.
- Wolfe Harrison, J. (2026). Format-Driven AI Context Architecture: The .faf Standard for Persistent Project Understanding. DOI: 10.5281/zenodo.18251362.
