The author argues that heavily relying on AI coding agents causes human developers to lose critical technical intuition and code review skills over time, proposing measures like mandatory hands-on coding days to maintain supervisory competence.
Been running an agent-heavy workflow on a mid-size TypeScript monorepo for about six months. Orchestrator on top, sub-agents for codegen, a human (me, mostly) writing specs and reviewing diffs. The pitch was the obvious one: I stay in the architect seat, agents handle the typing. Productivity goes up, my brain stays sharp on the hard parts. That's not what happened. What actually happened is that the parts of the job I used to do by reflex started to atrophy. Not the big architecture calls. The small ones. The ones that make you good at reviewing code in the first place. A few concrete examples from the last quarter: \- A sub-agent wrote a Drizzle query that did an N+1 inside a loop over user orgs. I approved it. It passed tests because the test fixture had two orgs. Caught it in staging when p95 on that endpoint went from 40ms to 1.8s. Two years ago I would have seen that shape of code and flinched before reading it. I didn't flinch. \- An agent picked Zod for runtime validation in a hot path where we'd previously, deliberately, used hand-rolled guards because Zod's parse cost showed up on flame graphs. The spec didn't mention the prior decision. I didn't remember the prior decision. The agent had no way to know. \- Refactor of an auth middleware. The diff was 400 lines, looked clean, types checked. I skimmed it the way you skim agent output once you've reviewed a few hundred of them. Missed that it had silently dropped a CSRF check on one route. Found in a pen test. None of these are agent failures in the interesting sense. They're failures of the supervisor, which is me, which is the whole point of the model. Here's the loop I think people aren't naming: 1. You move from writing code to writing specs and reviewing diffs. 2. Spec-writing exercises a different muscle than coding. Mostly product and interface reasoning, not implementation reasoning. 3. Diff review at agent speed (dozens per day) trains you to pattern-match on surface plausibility, not to trace execution. 4. The skills that let you write a sharp spec and a sharp review, knowing which queries are expensive, which libraries have which footguns, which middleware order matters, came from years of writing and debugging that code yourself. 5. Stop doing the writing and debugging, and over months those skills degrade. Quietly. You don't notice because the agent is doing the work that used to surface them. 6. Now you're supervising a system you're slowly becoming less qualified to supervise. The seniors on my team are mostly fine, for now, because they have a decade of cached intuition. The mid-levels are the canary. They've been on agent-heavy work for about a year and their review comments have gotten visibly worse. Less specific. More vibes. "This feels off" without a follow-up about which line and why. I'm not anti-agent. The throughput is real and I'm not giving it up. But I think the framing of "humans do specs, agents do code" is wrong in a way that takes 12-18 months to show up. The humans need to keep writing code, including code the agent could have written, specifically to keep the supervisor sharp. It's the same reason pilots still hand-fly approaches even though autopilot is better at it on average. What we're trying now, not claiming it works yet: \- One day a week where the agent is off. You write the code. Bugs and all. \- Rotating "deep review" assignments where one engineer takes a single agent-generated PR and traces every call path, writes up what they found. Slow on purpose. \- Spec docs now have to include a "prior decisions and why" section, written by a human who remembers, not regenerated. Curious whether anyone else running agent-heavy workflows for more than a year is seeing the same skill drift, and what you've done about it. Or whether I'm wrong about the mechanism and the mid-level regression is something else.
The article argues that agentic coding, where AI generates code and humans act as orchestrators, is a trap due to increased system complexity, skill atrophy, and vendor lock-in. It highlights the negative impact on developer learning and critical thinking, contrasting this new abstraction with historical programming shifts.
Simon Willison reflects on how vibe coding and agentic engineering are converging in his own workflow, raising concerns about code review responsibilities as AI coding agents like Claude Code become increasingly reliable. He explores the ethical tension between trusting AI-generated code in production and maintaining software engineering standards.
The article argues that AI coding tools are generating hidden technical debt in enterprise codebases by ignoring established organizational conventions, a problem that requires better context awareness rather than just improved model quality.
A blog post argues that current AI agents exhibit overly human-like flaws such as ignoring hard constraints, taking shortcuts, and reframing unilateral pivots as communication failures, while citing Anthropic research on how RLHF optimization can lead to sycophancy and truthfulness sacrifices.
The article analyzes how AI agents disrupt traditional code review processes, creating a 'principal-agent problem' where reviewers cannot effectively gauge effort or quality, leading to an increase in low-quality 'slop PRs' in open source.