@DeRonin_: Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLA…

X AI KOLs Following News

Summary

Andrej Karpathy states that 90% of Claude's mistakes stem from missing context, not model weakness, and provides a set of 12 rules that reduced error rates from 41% to 3% in experiments.

Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below here are the 12 rules senior engineers settled on: 1. think before coding: state assumptions, don't guess. the model can't read your mind, stop hoping it will 2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add "for future flexibility," you've added 200 lines you'll delete next quarter 3. surgical changes: touch only what you must. don't let it improve adjacent code, that's how PRs blow up 4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early 5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers 6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5 7. surface conflicts, don't average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice 8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read 9. tests verify intent, not just behavior: a test that can't fail when business logic changes is wrong. all 12 of Claude's tests can pass while the function returns a constant 10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour 11. match the codebase conventions: class components? don't fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing 12. fail loud: "completed successfully" with 14% of records silently skipped is the worst class of bug. surface uncertainty, don't hide it what actually compounds instead of the next framework: - the CLAUDE.md file as institutional memory across sessions - eval-driven changes, not vibe-driven - checkpoints over speed - explicit conflicts over silent blending - discipline over framework, every time - one repo, one rules file, no exceptions be a few rules ahead of AI twitter before this becomes mass-opinion study this
Original Article
View Cached Full Text

Cached at: 05/18/26, 10:30 AM

Andrej Karpathy: “90% of Claude’s mistakes come from missing context, not a weak model.”

41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below

here are the 12 rules senior engineers settled on:

  1. think before coding: state assumptions, don’t guess. the model can’t read your mind, stop hoping it will

  2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add “for future flexibility,” you’ve added 200 lines you’ll delete next quarter

  3. surgical changes: touch only what you must. don’t let it improve adjacent code, that’s how PRs blow up

  4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early

  5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers

  6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5

  7. surface conflicts, don’t average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice

  8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read

  9. tests verify intent, not just behavior: a test that can’t fail when business logic changes is wrong. all 12 of Claude’s tests can pass while the function returns a constant

  10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour

  11. match the codebase conventions: class components? don’t fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing

  12. fail loud: “completed successfully” with 14% of records silently skipped is the worst class of bug. surface uncertainty, don’t hide it

what actually compounds instead of the next framework:

  • the CLAUDE.md file as institutional memory across sessions
  • eval-driven changes, not vibe-driven
  • checkpoints over speed
  • explicit conflicts over silent blending
  • discipline over framework, every time
  • one repo, one rules file, no exceptions

be a few rules ahead of AI twitter before this becomes mass-opinion

study this

Ronin (@DeRonin_): anybody who uses or learns agentic systems, SHOULD READ THIS

the install order I run before any new agentic project:

  1. PRIVACY: direnv + a real secrets manager

install direnv, then plug it into your team’s password manager (1Password CLI via op run, doppler, infisical, vault,

Similar Articles