@DivyanshT91162: Andrej Karpathy built his whole reputation on one idea: "You don't really understand it until you can build it from scr…

X AI KOLs Timeline 06/28/26, 11:42 AM News

deep-learning education from-scratch andrej-karpathy claude anthropic

Summary

A Twitter thread discusses the irony of Andrej Karpathy's teaching philosophy—learning by building from scratch—while he now uses AI to code, and shares a 3-week learning journey using Claude Opus 4.8 to master deep learning the old way.

Andrej Karpathy built his whole reputation on one idea: "You don't really understand it until you can build it from scratch." No libraries. No magic. Just you, a blank file, and the raw math. Here's the irony: Karpathy himself joined Anthropic in 2026 and now says he barely writes code by hand anymore — he lets the model do it. So I closed the loop. I made Claude Opus 4.8 teach me deep learning the OLD Karpathy way: backprop by hand, micrograd, GPT from zero. 3 weeks of this beat 2 years of tutorials. 9 prompts

Original Article

View Cached Full Text

Cached at: 06/29/26, 10:27 AM

Andrej Karpathy built his whole reputation on one idea:

“You don’t really understand it until you can build it from scratch.”

No libraries. No magic. Just you, a blank file, and the raw math.

Here’s the irony: Karpathy himself joined Anthropic in 2026 and now says he barely writes code by hand anymore — he lets the model do it.

So I closed the loop. I made Claude Opus 4.8 teach me deep learning the OLD Karpathy way: backprop by hand, micrograd, GPT from zero.

3 weeks of this beat 2 years of tutorials. 9 prompts

Prompt 1: Backprop By Hand

Karpathy says everything clicks once you can do backpropagation manually, on paper. This forces that moment.

“You are Andrej Karpathy teaching me backpropagation from absolute first principles, the way you do in micrograd.

Do NOT use any libraries. Start with a single neuron.

Walk me through a forward pass with real numbers I can follow.
Now do the backward pass BY HAND — compute every gradient step by step, explaining the chain rule each time as if I’ve never trusted it before.
Show me where the gradient ‘flows’ and why.
Then give me a ~30 line pure-Python Value class that implements autograd for +, *, and tanh, with comments explaining WHY each line exists. End by giving me one tiny exercise to test if I actually understood it.“

Prompt 2: Build Micrograd From Zero

Karpathy’s micrograd is a full autograd engine in ~100 lines. Rebuilding it is the rite of passage. This guides you through it without handing you the answer.

“Act as my Karpathy-style mentor. I want to build micrograd myself, not copy it.

Guide me to construct a minimal automatic differentiation engine in pure Python, step by step:

First, make me design the Value object’s interface before showing yours — ask me what it needs to store.
Then implement operations one at a time (+, *, pow, tanh/relu), and after EACH, make me predict what the backward function should be before you reveal it.
Build the topological sort for .backward() and explain why order matters.
Finally, train a tiny 2-layer MLP on a handful of points so I SEE the loss drop. Push back if my reasoning is lazy or hand-wavy — that’s the point.“

Prompt 3: The makemore Ladder

Karpathy teaches language models by slowly climbing from a bigram counting model to a neural net. This recreates that exact ladder for you.

“Teach me language modeling the makemore way — starting embarrassingly simple and adding ONE idea at a time.

Use a small dataset (like a list of names).

Level 1: build a pure bigram model using nothing but a count matrix. Show me it ‘works.’
Level 2: reframe the SAME bigram model as a tiny neural net with one weight matrix + softmax, and prove they’re equivalent.
Level 3: extend the context beyond one character and explain what breaks. For each level, give runnable code, tell me what new concept it introduces, and what limitation forces the jump to the next level. Don’t skip rungs.“

Prompt 4: GPT From Scratch

Karpathy’s “let’s build GPT” is legendary. This makes Claude reconstruct a minimal transformer with you, component by component.

“Be Karpathy walking me through building a minimal GPT from scratch in PyTorch — but make me understand every block, not just paste it.

Go in this order, and STOP after each to explain the intuition before the code:

Token + position embeddings — why both?
A single self-attention head — derive why we use query, key, value, and what the softmax is doing.
Multi-head attention — why split into heads at all?
The feed-forward + residual + layernorm sandwich — what each part rescues.
Stack into blocks and train on a tiny text file. Keep the model small enough to run on a laptop. After the build, ask me 3 questions to check I actually get attention.“

Prompt 5: The “Explain It Like A Diagram” Drill

Karpathy’s gift is turning math into mental pictures. This makes Claude convert any concept you’re stuck on into one.

“I’m stuck on this concept: [PASTE THE CONCEPT — e.g. attention, layernorm, KV cache, gradient clipping].

Explain it the Karpathy way:

Start with the dumbest possible version of the idea and why someone would even want it.
Build it up one addition at a time, each motivated by a concrete problem.
Describe it as a picture/diagram in words — what’s moving, what’s being multiplied, what’s flowing where.
Give me the smallest possible code snippet that demonstrates it in isolation.
End with the one sentence I should remember forever. No jargon unless you immediately unpack it.“

Prompt 6: Read The Code Like Karpathy

Karpathy reads real implementations line by line. This turns any intimidating repo or snippet into a guided tour.

“Here’s a chunk of ML code I don’t fully understand: [PASTE CODE].

Walk me through it the way Karpathy reads code on stream:

Give me the 1-sentence ‘what is this even trying to do’ before any detail.
Go through it in logical chunks (not line-by-line trivia) — for each, explain the INTENT, not just the syntax.
Flag anything that’s a common idiom or trick worth memorizing.
Point out where the ‘interesting’ part is vs. boilerplate.
Tell me what would break if I deleted or changed each key piece. Treat me as someone who can code but hasn’t internalized ML patterns yet.“

Prompt 7: The From-Scratch Challenge

Karpathy believes you cement knowledge by reimplementing, not re-reading. This generates a personalized build challenge.

“Based on what I say I just learned: [DESCRIBE — e.g. ‘I just understood self-attention’], design me a from-scratch challenge to prove I actually own it.

Give me:

A small, concrete build task that forces me to use the concept (not a toy I can fake).
A blank ‘skeleton’ with function signatures and TODOs — but NO implementations.
A list of 4-5 checkpoints so I know if I’m on track.
The specific bug or misunderstanding most people hit here, so I recognize it. Do NOT give me the solution yet. When I come back with my attempt, then critique it like Karpathy reviewing a student’s PR.“

Prompt 8: Debug My Tiny Model

When a from-scratch model won’t learn, the bug is usually conceptual, not syntactic. This debugs the way Karpathy does — by reasoning about the signal.

“My from-scratch model is misbehaving: [DESCRIBE — loss is NaN / loss flat / it won’t overfit even one batch]. Here’s my code: [PASTE].

Debug it the Karpathy way, not by random tweaking:

First rule: can it overfit a single batch? Walk me through checking that and what it tells us.
Reason about whether the gradients can even flow given my code — point to the exact suspect lines.
Rank the 3 most likely root causes for THIS symptom, most probable first.
For each, give me the one-line check that confirms or kills it. Only then give the fix — and explain the underlying principle so I never make it again.“

Prompt 9: The Spaced Mastery Map

Karpathy’s courses build in a deliberate order so nothing is hand-waved. This turns your scattered learning into that kind of path.

“Here’s everything I’ve learned so far and where I feel shaky: [LIST TOPICS + YOUR CONFIDENCE ON EACH].

Act as Karpathy designing my personal curriculum from here:

Diagnose which ‘foundational’ gaps are secretly causing my shaky spots downstream.
Order my next 6 learning steps so each one is built ONLY on things I’ve already truly understood — no leaps.
For each step, give me the from-scratch artifact I should build to prove it.
Tell me the ONE concept that, once it clicks, will make several others fall into place. Be honest if I’m trying to run before I can walk.“

@DivyanshT91162: Andrej Karpathy built his whole reputation on one idea: "You don't really understand it until you can build it from scr…

Similar Articles

@Zephyr_hg: Andrej Karpathy: "If I can't build it, I don't understand it." In a 2-hour interview, the researcher behind a lot of mo…

@0xMortyx: OpenAI co-founder Andrej Karpathy: "You literally have to put in 10,000 hours to learn Claude" 11-min workshop from Goa…

@k1rallik: > be Andrej Karpathy > born in Slovakia, move to Canada at 15 > start coding at 15. instantly obsessed > become YouTube…

@rewind02: Andrej Karpathy spent 2 hours explaining what most AI educators won't tell you it's about what happens to humans when A…

@zostaff: Andrej Karpathy spent 20 years writing code by hand. Then he switched to Claude Code. A week later he posted everything…

Submit Feedback

Similar Articles

@Zephyr_hg: Andrej Karpathy: "If I can't build it, I don't understand it." In a 2-hour interview, the researcher behind a lot of mo…

@0xMortyx: OpenAI co-founder Andrej Karpathy: "You literally have to put in 10,000 hours to learn Claude" 11-min workshop from Goa…

@k1rallik: > be Andrej Karpathy > born in Slovakia, move to Canada at 15 > start coding at 15. instantly obsessed > become YouTube…

@rewind02: Andrej Karpathy spent 2 hours explaining what most AI educators won't tell you it's about what happens to humans when A…

@zostaff: Andrej Karpathy spent 20 years writing code by hand. Then he switched to Claude Code. A week later he posted everything…