TL;DR: Two complementary, pivotal papers from early 2026 challenge how we think about AI context. The arXiv study proves prose context files hurt performance and inflate cost. The FAF paper provides the structured fix. Together they signal: ditch the junk drawer, embrace standards.

In the fast-evolving world of AI-assisted coding, context is king—or so we thought. The arXiv paper “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” (arXiv:2602.11988) delivers a wake-up call, revealing how popular context files like AGENTS.md often hinder rather than help. My Zenodo-archived work, “Format-Driven AI Context Architecture: The .faf Standard for Persistent Project Understanding” (DOI: 10.5281/zenodo.18251362), builds on this by proposing FAF—a user-owned, IANA-registered standard that eliminates bloat while ensuring persistent, portable understanding.

These papers aren't rivals — they're complementary. The arXiv study spotlights the problems, FAF provides the fix. Facts first, logic follows.

The Wake-Up Call: The arXiv Paper

Published February 12, 2026, this paper by researchers at ETH Zurich's Secure, Reliable, and Intelligent Systems Lab rigorously tests a widespread practice: using repository-level context files (e.g., AGENTS.md, CLAUDE.md) to guide coding agents.

Key Methodology

The authors introduce AGENTbench, a novel benchmark curating real-world issues from repositories with developer-committed context files. They complement this with SWE-bench Lite (established tasks from popular repos). Evaluations span multiple coding agents and LLMs for generating files.

Experiments: Compare agent performance with/without context files. LLM-generated files use varied prompts/models; developer files are real-world samples.
Metrics: Task success rates, inference costs (tokens), behavioral traces (exploration, testing depth).
Scale: Tested on 100+ repos, multiple LLMs—robust and reproducible.

“Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%.”

Core Findings

-3%

LLM-Generated

Performance drop from machine-written context files

+4%

Human-Written

Marginal gain from developer-authored files

+20%

Cost Spike

Inference cost increase from broader exploration

Bloat

Root Cause

Unnecessary requirements make tasks harder

Limitations: Focuses on coding agents; doesn't test structured formats beyond MD. Strengths: Real-world data, agent trace analysis—a foundational critique that flips assumptions.

It validates what developers already feel — Theo Browne (t3.gg) dissected this paper publicly, spotlighting how AI crushes structured data like package.json but struggles with prose bloat. This sets the stage for better solutions.

Why Prose Is the Problem

In addition to bloat, MDs are subjective, unvalidated, and potentially misleading. Agents follow them literally — even when they're wrong. The arXiv data proves it: prose context is a disguised liability. FAF replaces prose with structure. That's not a style choice — it's a common-sense safety decision.

The Solution Blueprint: The FAF Paper

My Zenodo paper (published January 15, 2026, on CERN's open platform) addresses the sovereignty gap in AI context. It proposes FAF as a portable, user-controlled format—think “package.json for your project's context.”

Key Methodology

Drawing from 27,000+ ecosystem downloads and 1,051+ tests, the paper validates FAF across platforms (Claude, OpenAI Codex, Google Gemini). It includes schema specs, performance benchmarks (e.g., 220x faster binary loading), and GDPR alignment analysis.

Structure: YAML-based with sections for mission, tech_stack, key_files, context (architecture, conventions).
Tools: faf-cli (41 commands), MCP server integration (Anthropic-approved merge #2759), Rust SDK.
Validation: Cross-platform scores (9.0-9.5/10), IANA registration (application/vnd.faf+yaml, Oct 2025).
Binary Companion: .fafb for efficient loading—32-byte header, priority sections for token budgets.

“We present FAF... an IANA-registered format that enables user-controlled, portable AI context. Like Solid pods for personal data, FAF files give users sovereignty over their AI context.”

Core Contributions

🔒

User Ownership

Local files invert vendor control—transparent, portable, no cloud dependency.

⚡

Performance

91% token reclaim, minimal <2KB size. Scoring system ensures AI-readiness.

🛡

Privacy

Aligns with GDPR (access, portability, erasure); enables offline workflows.

🌍

Ecosystem

8 SDKs, Anthropic MCP merge (PR #2759), enterprise features.

FAF isn't just theory; it's live infrastructure enhancing AI tools today.

The Structure

Human-readable YAML as the single source of truth, branching to AI-specific outputs:

.faf Project DNA

faf_version "2.5.0"

project {name, mission}

tech_stack {languages, frameworks}

key_files [{path, purpose}]

context {architecture, conventions}

outputs {claude_md, agents_md, cursorrules, gemini_md}

CLAUDE.md

AGENTS.md

.cursorrules

GEMINI.md

One source. Four native formats. Zero drift.

Head-to-Head: How FAF Elevates the arXiv Findings

The arXiv paper exposes the pitfalls of unstructured MD files. FAF directly addresses them. Here's how they complete each other:

Traditional MD Flow

Repo

↓

Bloated MD Prose, subjective, unvalidated

↓

Agent Explores Broadly +20% cost, over-testing

↓

-3% Performance Task harder, cost higher

FAF Standard Flow

Repo

↓

.faf (Project DNA) Structured, scored, validated

↓

Lean Outputs Agent focuses on essentials

↓

Persistent Context Minimal, portable, fast

Aspect	arXiv Critique	FAF Solution
Bloat	MD files add noise, confusing agents with generic docs	Structured YAML fields enforce minimal, essential context—compiler trims fluff, scoring flags low-quality
Success Rates	-3% to +4% marginal; over-exploration hurts	Empirical +6.7x response speed; priority loading fits token windows, focusing agents on essentials
Costs	+20% inference from broader traces	Binary .fafb parses 220x faster; minimal size (<1K tokens) cuts overhead
Human vs LLM	Humans slight edge, but still risky	Auto-gen lean outputs from .faf project DNA; human-verified timestamps ensure accountability
Portability	Vendor-siloed (e.g., .cursorrules only for Cursor)	IANA-standard, cross-AI (Claude/Grok/Gemini); user-owned like Solid pods—no lock-in
Exploration	Encourages over-testing, lowering success	Contextual primitives guide without mandating—minimal requirements by design

arXiv urges “human-written context files should describe only minimal requirements.” FAF delivers: One .faf (Project DNA for any AI), multiple expressions (CLAUDE.md, AGENTS.md, .cursorrules, .windsurf)—playbook, not rulebook.

Note: ArXiv metrics measure coding agent task resolution rates. FAF metrics measure AI-Readiness — persistent project context scoring as defined in Anthropic's MCP ecosystem (merge #2759). Complementary benchmarks, not direct comparisons.

The Future of AI Context

The arXiv paper is a foundational critique — credit to the authors for AGENTbench and their data-driven debunking. My FAF paper is the next chapter, enhancing those insights into infrastructure. Together, they signal: ditch the junk drawer, embrace standards.

Try FAF

npm i faf-cli faf init faf bi-sync --all

Cite these papers. Build on them. Make every repo AI-ready.

Questions? DM @wolfe_jam.

References

Gloaguen, T., Mündler, N., Müller, M., Raychev, V., & Vechev, M. (2026). Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents? arXiv:2602.11988.
Wolfe Harrison, J. (2026). Format-Driven AI Context Architecture: The .faf Standard for Persistent Project Understanding. DOI: 10.5281/zenodo.18251362.

Beyond the Bloat

The Wake-Up Call: The arXiv Paper

Key Methodology

Core Findings

Why Prose Is the Problem

The Solution Blueprint: The FAF Paper

Key Methodology

Core Contributions

User Ownership

Performance

Privacy

Ecosystem

The Structure

Head-to-Head: How FAF Elevates the arXiv Findings

Traditional MD Flow

FAF Standard Flow

The Future of AI Context

Try FAF

References

Further Reading