@NFTCPS: If you work in AI, take this UCLA course! Theory + practice: a deep dive into RL and LLM training from scratch. Covers MDP, PPO algorithms, the full RLHF process, and hands-on Jupyter coding. Taught by a UCLA professor with videos and assignments, ready to apply immediately after completion. Course URL: https://ernestryu.com/courses/RL-LLM.html…
Summary
This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.
View Cached Full Text
Cached at: 05/10/26, 10:25 AM
If you’re into AI, this UCLA course is a must! It breaks down RL and LLM training from zero to one with theory + hands-on practice. Covers MDPs, PPO algorithm, RLHF workflows, and Jupyter code exercises. Taught by a UCLA professor, includes videos and assignments, and lets you get hands-on experience. Course link: https://ernestryu.com/courses/RL-LLM.html… Stop just reading papers—this course will actually teach you RL+LLM training. Otherwise, you won’t even know how ChatGPT was trained!
Reinforcement Learning of Large Language Models
Source: https://ernestryu.com/courses/RL-LLM.html
Lecture slides
- Chapter 0: Prologue (https://ernestryu.com/courses/RL-LLM/chapter0.pdf).
- Chapter 1: Deep Reinforcement Learning (https://ernestryu.com/courses/RL-LLM/chapter1.pdf).
- Chapter 2: Large Language Models (https://ernestryu.com/courses/RL-LLM/chapter2.pdf).
- Chapter 3: Reinforcement Learning of Large Language Models (https://ernestryu.com/courses/RL-LLM/chapter3.pdf).
Lecture videos
- Chapter 0: Prologue (https://youtu.be/q9972BRoXzQ).
- Chapter 1.1: MDP foundations, imitation learning, and value iteration (https://youtu.be/R2oT9Tcv0eU).
- Chapter 1.2: Deep policy evaluation (https://youtu.be/KwNs7AT3UcY).
- Chapter 1.3: Deep policy gradient methods (A3C) (https://youtu.be/iWOJpNr-kcI).
- Chapter 1.4: Deep policy gradient methods (PPO, GRPO) (https://youtu.be/qzaX7DBloZc).
- Chapter 1.5: AlphaGo, test-time compute, and expert iteration (https://youtu.be/8ZnVAu1tlYw).
- Chapter 2.1: NLP foundations, language modeling, RNNs (https://youtu.be/dhu_ZYUsBnw).
- Chapter 2.2: Transformers I (BERT, GPT-1) (https://youtu.be/q5Sl4bO-wBk).
- Chapter 2.3: Transformers II (modern transformer updates and sampling methods) (https://youtu.be/88HtzKoSzSE).
- Chapter 2.4: In-context learning and instruction fine-tuning (https://youtu.be/dPHrgBv4c9s).
- Chapter 3.1: Reinforcement Learning from Human Feedback (PPO, DPO) (https://youtu.be/IijXgwZJarU).
- Chapter 3.2: Reinforcement Learning with Verifiable Rewards (RLVR) (https://youtu.be/QKgVPTC_M1Q).
Course Information
Instructor
Ernest K. Ryu (http://www.math.snu.ac.kr/~ernestryu/) Assistant Professor of Mathematics, UCLA, Photo of Ernest Ryu
Prerequisites
Students are expected to have basic familiarity with deep learning at the level of image classification. No prior experience with reinforcement learning (RL) or large language models (LLMs) is assumed. For the deep RL lectures, students should be familiar with conditional expectations and the tower property (law of total expectation).
Similar Articles
@wsl8297: UC's Open Course on Reinforcement Learning for LLMs uses a 'theory + practice' approach to thoroughly explain key AI training techniques from the ground up, helping you systematically build a complete framework spanning from RL to LLM training. Comprehensive curriculum paired with complete resources: lecture slides, full videos, and practical exercises are all provided so you can start implementing right away…
Assistant Professor Ernest K. Ryu at UCLA offers the open course 'Reinforcement Learning for Large Language Models,' comprehensively analyzing key LLM training techniques like RLHF, PPO, and DPO alongside their supporting resources through a blend of theory and practice. The course provides developers and researchers with a systematic learning path from foundational algorithms to practical deployment.
@ickma2311: CMU Advanced NLP: Reinforcement Learning I had been curious about how RL works on top of LLMs, and this CMU lecture mad…
CMU Advanced NLP lecture clarifies how reinforcement learning optimizes whole-output rewards (correctness, helpfulness, safety) rather than next-token prediction used in pretraining/fine-tuning.
@DanKornas: "Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)" (Stanford Online), ... What you will learn:…
Stanford CS229 online course announcement covering building LLMs, deep neural networks, TensorFlow, Keras, OpenCV, and NLP with spaCy.
@jiqizhixin: Awesome blog! State of RL for reasoning LLMs https://aweers.de/blog/2026/rl-for-llms/…
A comprehensive blog post reviewing the state of reinforcement learning for reasoning LLMs, covering methods from REINFORCE and PPO to GRPO and beyond, with connections to key models like InstructGPT and DeepSeek-R1.
@tan_maty: I'm blown away by this course, a must-see for CS majors: CS336, a course that's recently become legendary in the AI community. Building large language models from scratch. This course is offered by Stanford, taught by top NLP experts Percy Liang and Tatsunori Hashim…
A thread promoting Stanford's CS336 course on building language models from scratch, taught by NLP experts Percy Liang and Tatsunori Hashimoto, emphasizing hands-on understanding.