@NFTCPS: If you work in AI, take this UCLA course! Theory + practice: a deep dive into RL and LLM training from scratch. Covers MDP, PPO algorithms, the full RLHF process, and hands-on Jupyter coding. Taught by a UCLA professor with videos and assignments, ready to apply immediately after completion. Course URL: https://ernestryu.com/courses/RL-LLM.html…
Summary
This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.
View Cached Full Text
Cached at: 05/10/26, 10:25 AM
If you’re into AI, this UCLA course is a must! It breaks down RL and LLM training from zero to one with theory + hands-on practice. Covers MDPs, PPO algorithm, RLHF workflows, and Jupyter code exercises. Taught by a UCLA professor, includes videos and assignments, and lets you get hands-on experience. Course link: https://ernestryu.com/courses/RL-LLM.html… Stop just reading papers—this course will actually teach you RL+LLM training. Otherwise, you won’t even know how ChatGPT was trained!
Reinforcement Learning of Large Language Models
Source: https://ernestryu.com/courses/RL-LLM.html
Lecture slides
- Chapter 0: Prologue (https://ernestryu.com/courses/RL-LLM/chapter0.pdf).
- Chapter 1: Deep Reinforcement Learning (https://ernestryu.com/courses/RL-LLM/chapter1.pdf).
- Chapter 2: Large Language Models (https://ernestryu.com/courses/RL-LLM/chapter2.pdf).
- Chapter 3: Reinforcement Learning of Large Language Models (https://ernestryu.com/courses/RL-LLM/chapter3.pdf).
Lecture videos
- Chapter 0: Prologue (https://youtu.be/q9972BRoXzQ).
- Chapter 1.1: MDP foundations, imitation learning, and value iteration (https://youtu.be/R2oT9Tcv0eU).
- Chapter 1.2: Deep policy evaluation (https://youtu.be/KwNs7AT3UcY).
- Chapter 1.3: Deep policy gradient methods (A3C) (https://youtu.be/iWOJpNr-kcI).
- Chapter 1.4: Deep policy gradient methods (PPO, GRPO) (https://youtu.be/qzaX7DBloZc).
- Chapter 1.5: AlphaGo, test-time compute, and expert iteration (https://youtu.be/8ZnVAu1tlYw).
- Chapter 2.1: NLP foundations, language modeling, RNNs (https://youtu.be/dhu_ZYUsBnw).
- Chapter 2.2: Transformers I (BERT, GPT-1) (https://youtu.be/q5Sl4bO-wBk).
- Chapter 2.3: Transformers II (modern transformer updates and sampling methods) (https://youtu.be/88HtzKoSzSE).
- Chapter 2.4: In-context learning and instruction fine-tuning (https://youtu.be/dPHrgBv4c9s).
- Chapter 3.1: Reinforcement Learning from Human Feedback (PPO, DPO) (https://youtu.be/IijXgwZJarU).
- Chapter 3.2: Reinforcement Learning with Verifiable Rewards (RLVR) (https://youtu.be/QKgVPTC_M1Q).
Course Information
Instructor
Ernest K. Ryu (http://www.math.snu.ac.kr/~ernestryu/) Assistant Professor of Mathematics, UCLA, Photo of Ernest Ryu
Prerequisites
Students are expected to have basic familiarity with deep learning at the level of image classification. No prior experience with reinforcement learning (RL) or large language models (LLMs) is assumed. For the deep RL lectures, students should be familiar with conditional expectations and the tower property (law of total expectation).
Similar Articles
@wsl8297: UC's Open Course on Reinforcement Learning for LLMs uses a 'theory + practice' approach to thoroughly explain key AI training techniques from the ground up, helping you systematically build a complete framework spanning from RL to LLM training. Comprehensive curriculum paired with complete resources: lecture slides, full videos, and practical exercises are all provided so you can start implementing right away…
Assistant Professor Ernest K. Ryu at UCLA offers the open course 'Reinforcement Learning for Large Language Models,' comprehensively analyzing key LLM training techniques like RLHF, PPO, and DPO alongside their supporting resources through a blend of theory and practice. The course provides developers and researchers with a systematic learning path from foundational algorithms to practical deployment.
@ickma2311: CMU Advanced NLP: Reinforcement Learning I had been curious about how RL works on top of LLMs, and this CMU lecture mad…
CMU Advanced NLP lecture clarifies how reinforcement learning optimizes whole-output rewards (correctness, helpfulness, safety) rather than next-token prediction used in pretraining/fine-tuning.
@jiqizhixin: Awesome blog! State of RL for reasoning LLMs https://aweers.de/blog/2026/rl-for-llms/…
A comprehensive blog post reviewing the state of reinforcement learning for reasoning LLMs, covering methods from REINFORCE and PPO to GRPO and beyond, with connections to key models like InstructGPT and DeepSeek-R1.
@NFTCPS: Want to master Reinforcement Learning? Keep dreaming, bro. Online courses just teach you how to call APIs, leaving you utterly confused after finishing. Reading papers? Mountains of formulas will scare you off instantly. Trying to systematically understand the principles? The barrier to entry feels like climbing to heaven, and the learning path is as tangled as a maze. Recently, I stumbled upon an open-source book, 'Mathematical Foundations of Reinforcement Learning,' that pierces right through this fog. It provides a crystal-clear roadmap: starting from mathematics…
Introduces an open-source book, 'Mathematical Foundations of Reinforcement Learning,' which offers a rigorous yet accessible mathematical approach to RL, using grid world examples to clarify algorithmic logic.
@Ai_Tech_tool: ANDREJ KARPATHY COULD HAVE CHARGED $2,000 FOR THIS COURSE. He put it on YouTube. The full training stack. Tokenization.…
Highlights Andrej Karpathy's free three-hour YouTube course covering LLM fundamentals, including tokenization, neural network internals, RLHF, and reinforcement learning. Emphasizes that understanding these core architectural principles offers major career advantages over simply knowing how to use off-the-shelf AI tools.