@DeRonin_: Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLA…

X AI KOLs Following 05/18/26, 09:07 AM News

claude best-practices prompting context agentic-systems andrej-karpathy rules

Summary

Andrej Karpathy states that 90% of Claude's mistakes stem from missing context, not model weakness, and provides a set of 12 rules that reduced error rates from 41% to 3% in experiments.

Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below here are the 12 rules senior engineers settled on: 1. think before coding: state assumptions, don't guess. the model can't read your mind, stop hoping it will 2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add "for future flexibility," you've added 200 lines you'll delete next quarter 3. surgical changes: touch only what you must. don't let it improve adjacent code, that's how PRs blow up 4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early 5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers 6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5 7. surface conflicts, don't average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice 8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read 9. tests verify intent, not just behavior: a test that can't fail when business logic changes is wrong. all 12 of Claude's tests can pass while the function returns a constant 10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour 11. match the codebase conventions: class components? don't fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing 12. fail loud: "completed successfully" with 14% of records silently skipped is the worst class of bug. surface uncertainty, don't hide it what actually compounds instead of the next framework: - the CLAUDE.md file as institutional memory across sessions - eval-driven changes, not vibe-driven - checkpoints over speed - explicit conflicts over silent blending - discipline over framework, every time - one repo, one rules file, no exceptions be a few rules ahead of AI twitter before this becomes mass-opinion study this

Original Article

View Cached Full Text

Cached at: 05/18/26, 10:30 AM

Andrej Karpathy: “90% of Claude’s mistakes come from missing context, not a weak model.”

41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below

here are the 12 rules senior engineers settled on:

think before coding: state assumptions, don’t guess. the model can’t read your mind, stop hoping it will
simplicity first: minimum code, no speculative abstractions. the moment you let Claude add “for future flexibility,” you’ve added 200 lines you’ll delete next quarter
surgical changes: touch only what you must. don’t let it improve adjacent code, that’s how PRs blow up
goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early
use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers
token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5
surface conflicts, don’t average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice
read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read
tests verify intent, not just behavior: a test that can’t fail when business logic changes is wrong. all 12 of Claude’s tests can pass while the function returns a constant
checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour
match the codebase conventions: class components? don’t fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing
fail loud: “completed successfully” with 14% of records silently skipped is the worst class of bug. surface uncertainty, don’t hide it

what actually compounds instead of the next framework:

the CLAUDE.md file as institutional memory across sessions
eval-driven changes, not vibe-driven
checkpoints over speed
explicit conflicts over silent blending
discipline over framework, every time
one repo, one rules file, no exceptions

be a few rules ahead of AI twitter before this becomes mass-opinion

study this

Ronin (@DeRonin_): anybody who uses or learns agentic systems, SHOULD READ THIS

the install order I run before any new agentic project:

PRIVACY: direnv + a real secrets manager

install direnv, then plug it into your team’s password manager (1Password CLI via op run, doppler, infisical, vault,

@DeRonin_: Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLA…

Similar Articles

@PrajwalTomar_: Claude isn't broken. Your CLAUDE .md is. Most people think Claude Code makes mistakes because the model is bad. Wrong. …

@DeRonin_: Andrej Karpathy: "90% of your AI coding bill is paying for context you didn't need to send" Here are 10 things senior A…

@PrajwalTomar_: Karpathy complained about Claude making mistakes, someone turned it into 4 rules, and it became the fastest-growing sin…

@_avichawla: A smarter Claude model burns more tokens, not fewer! And it's not a minor 3-5% difference. But 54% higher token usage. …

@aakashgupta: This guy literally broke down everything you need to master Claude: 6:07 - Why to Stop Using Chat 9:56 - Cowork vs Code…

Submit Feedback

Similar Articles

@PrajwalTomar_: Claude isn't broken. Your CLAUDE .md is. Most people think Claude Code makes mistakes because the model is bad. Wrong. …

@DeRonin_: Andrej Karpathy: "90% of your AI coding bill is paying for context you didn't need to send" Here are 10 things senior A…

@PrajwalTomar_: Karpathy complained about Claude making mistakes, someone turned it into 4 rules, and it became the fastest-growing sin…

@_avichawla: A smarter Claude model burns more tokens, not fewer! And it's not a minor 3-5% difference. But 54% higher token usage. …

@aakashgupta: This guy literally broke down everything you need to master Claude: 6:07 - Why to Stop Using Chat 9:56 - Cowork vs Code…