Tag
AlphaTransit combines Monte Carlo Tree Search with neural policy-value networks to optimize bus route design by predicting downstream quality without simulator rollouts. It achieves significant service rate improvements on a Bloomington transit benchmark.
A detailed discussion on reinforcement learning and its connection to modern AI, using the reconstruction of AlphaGo with modern tools as a clear example of search and self-play. Key takeaways include neural network amortization of search, credit assignment challenges in LLMs vs AlphaGo, and implications for automated research.
Eric Jang rebuilt AlphaGo from scratch and explained in detail the application of Monte Carlo Tree Search and deep learning in Go, demonstrating the feasibility of reproducing a powerful Go AI at low cost nowadays.
A blackboard lecture by Eric Jang walks through building AlphaGo from scratch with modern AI tools, covering RL, MCTS, self-play, and connecting to LLM training, along with a discussion on automated AI research.