@mylifcc: After analyzing 1281 agent runs (covering 40+ large open-source repositories), Sourcegraph concluded: Coding agents fail in large codebases not because the models aren't smart enough, but because the infrastructure can't keep up. The most common failure mode is "Lost in …"
Summary
Based on 1281 agent runs, Sourcegraph found that the main reason coding agents fail in large codebases is insufficient infrastructure, not model capability. The typical failure mode is "lost in the codebase," requiring improvements in code retrieval, navigation, and context engineering.
View Cached Full Text
Cached at: 05/24/26, 02:18 AM
Sourcegraph analyzed 1,281 agent runs (covering 40+ large open-source repos) and concluded:
Coding agents fail in large codebases not because models aren’t smart enough, but because infrastructure can’t keep up.
The most common failure pattern is “Lost in the codebase”:
The agent gets stuck in endless jumps and file reads, never converging to form an effective plan. When the codebase exceeds ~400k lines and relies only on basic tools like local grep/read, this problem significantly worsens.
Other repeatable failure patterns include selecting the wrong files/symbols and making only partial relevant edits.
The article’s core point is clear: relying solely on model capability doesn’t scale well in large codebases. Better code retrieval, intelligent navigation, and context engineering are the more critical bottlenecks.
For anyone building production-level coding agents, this is a signal worth taking seriously.
Similar Articles
@Xudong07452910: This paper is a must-read for heavy users of Claude Code, Codex, or other AI Agents. It doesn't study how Agents fail on benchmarks, but a more real problem: In real development, what exactly are AI coding agents doing...
This paper analyzes 20,574 real-world coding-agent sessions to identify how AI agents misalign with developer intent, finding that constraint violations and inaccurate self-reporting are the most common failure modes, imposing trust and effort costs rather than irreversible damage.
Feels like coding agents are good at finding code but bad at understanding projects
Discusses the observation that while coding agents are effective at locating code, they struggle with deeper project understanding, such as component relationships and project style. The author introduces RepoWise, a tool that provides repository-level signals like dependency graphs and git history to address these issues.
@teach_fireworks: AI Coding is now entering a very interesting phase. In the past, discussions focused heavily on model capabilities, context length, Agent Loops, Tool Use, and automated programming. However, once Agents are placed in real-world development environments for extended periods, many teams realize the issue isn't just about 'whether code can be generated...',
Introducing re_gent, an open-source tool that provides runtime-level version control and observability infrastructure for AI coding Agents, addressing code traceability and audit issues arising from long-running Agent sessions.
@FakeMaidenMaker: The scariest thing about using an AI agent to write code is losing control: the agent runs wild, quality is inconsistent, you don’t know what stage it’s in, and it messes things up halfway through. AWS just open-sourced a set of development lifecycle workflow rules specifically designed for AI coding agents — AI-DLC — that make the agent…
AWS has open-sourced AI-DLC (AI-Driven Development Life Cycle), a set of development lifecycle workflow rules designed for AI coding agents to help developers control agent behavior and ensure quality. It supports multiple platforms including Claude Code, Cursor, and GitHub Copilot.
@AYi_AInotes: A counter-intuitive judgment: 80% of Agent production crashes have nothing to do with model IQ — they're all from context overflow, tool misconfiguration, sub-agent runaway. The real watershed in 2026 is Harness and Loop, not the model. Bro, @wizardly_ai's engineering note...
This article points out that 80% of AI Agent production crashes are not due to model intelligence, but are caused by context overflow, tool misconfiguration, and sub-agent runaway. The author emphasizes that the watershed in 2026 lies in Harness (office systems, security) and Loop (automatic cycling mechanism), not the model itself.