@dwarkesh_sp: New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Some…

X AI KOLs Timeline 05/15/26, 04:28 PM News

alphago reinforcement-learning monte-carlo-tree-search llm ai-education lecture

Summary

A blackboard lecture by Eric Jang walks through building AlphaGo from scratch with modern AI tools, covering RL, MCTS, self-play, and connecting to LLM training, along with a discussion on automated AI research.

New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn. Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second. Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside. Timestamps: 0:00:00 – Basics of Go 0:08:06 – Monte Carlo Tree Search 0:31:53 – What the neural network does 1:00:22 – Self-play 1:25:27 – Alternative RL approaches 1:45:36 – Why doesn’t MCTS work for LLMs 2:00:58 – Off-policy training 2:11:51 – RL is even more information inefficient than you thought 2:22:05 – Automated AI researchers

Original Article

@dwarkesh_sp: New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Some…

Similar Articles

@ericjang11: For the last few months I've been working on a from-scratch implementation of AlphaGo, a 2016 AI breakthrough that insp…

Building AlphaGo from scratch – Eric Jang

@shedoesai: How to become dangerously good at AI without wasting 1000+ hours. No useless tutorials. No fake AI gurus. No informatio…

@codewithimanshu: Stanford professor just gave away the entire foundation of how AI Agents & automation actually works. 1-hour lecture. T…

From games to biology and beyond: 10 years of AlphaGo’s impact

Submit Feedback

Similar Articles

@ericjang11: For the last few months I've been working on a from-scratch implementation of AlphaGo, a 2016 AI breakthrough that insp…

Building AlphaGo from scratch – Eric Jang

@shedoesai: How to become dangerously good at AI without wasting 1000+ hours. No useless tutorials. No fake AI gurus. No informatio…

@codewithimanshu: Stanford professor just gave away the entire foundation of how AI Agents & automation actually works. 1-hour lecture. T…

From games to biology and beyond: 10 years of AlphaGo’s impact