@NFTCPS: 加州大学这课,搞AI的都给我冲! 理论+实战,把RL和LLM训练从零到一拆成渣。教你MDP、PPO算法、RLHF全流程,还有Jupyter代码实操。UCLA教授主讲,视频+作业都有,学完直接上手。 课程地址:https://ernestr…
摘要
This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.
查看缓存全文
缓存时间: 2026/05/10 10:25
加州大学这课,搞AI的都给我冲! 理论+实战,把RL和LLM训练从零到一拆成渣。教你MDP、PPO算法、RLHF全流程,还有Jupyter代码实操。UCLA教授主讲,视频+作业都有,学完直接上手。 课程地址:https://ernestryu.com/courses/RL-LLM.html… 别再傻看论文了,这套课能让你真学会RL+LLM训练。不然你连ChatGPT怎么调教出来的都不知道!
Reinforcement Learning of Large Language Models
Source: https://ernestryu.com/courses/RL-LLM.html
Lecture slides
- Chapter 0: Prologue.
- Chapter 1: Deep Reinforcement learning.
- Chapter 2: Large Language Models.
- Chapter 3: Reinforcement Learning of Large Language Models.
Lecture videos
- Chapter 0: Prologue.
- Chapter 1.1: MDP foundations, imitation learning, and value iteration.
- Chapter 1.2: Deep policy evaluation.
- Chapter 1.3: Deep policy gradient methods (A3C).
- Chapter 1.4: Deep policy gradient methods (PPO, GRPO).
- Chapter 1.5: AlphaGo, test-time compute, and expert iteration.
- Chapter 2.1: NLP foundations, language modeling, RNNs.
- Chapter 2.2: Transformers I (BERT, GPT-1).
- Chapter 2.3: Transformers II (modern transformers updates and sampling methods).
- Chapter 2.4: In-context learning and instruction fine-tuning.
- Chapter 3.1: Reinforcement learning from human feedback (PPO, DPO).
- Chapter 3.2: Reinforcement learning with verifiable rewards (RLVR).
Course Information
Instructor
Ernest K. Ryu Assistant Professor of Mathematics, UCLA,

Prerequisites
Students are expected to have basic familiarity with deep learning at the level of image classification. No prior experience with reinforcement learning (RL) or large language models (LLMs) is assumed. For the deep RL lectures, students should be familiar with conditional expectations and the tower property (law of total expectation).
相似文章
@wsl8297: 加州大学开放课程《大语言模型的强化学习》,用“理论 + 实战”的方式,把 AI 训练的关键技术从零到一讲透,帮你系统建立从强化学习到 LLM 训练的完整框架。 课程内容覆盖全面,配套资源齐全:讲座幻灯片、完整视频、实践练习一应俱全,学完就…
加州大学助理教授Ernest K. Ryu推出《大语言模型的强化学习》开放课程,结合理论与实践全面解析RLHF、PPO/DPO等LLM训练关键技术及配套资源。该课程为开发者与研究者提供了从基础算法到实战部署的系统学习路径。
@ickma2311:CMU 高级 NLP:强化学习 我一直好奇 RL 如何作用于大模型,而这门 CMU 课程让我豁然开朗……
CMU 高级 NLP 课程讲清了强化学习如何优化整个输出的奖励(正确性、有用性、安全性),而非预训练/微调阶段的下一个 token 预测。
@DanKornas: "斯坦福CS229 I 机器学习 I 构建大型语言模型(LLMs)"(斯坦福在线)... 你将学到:…
斯坦福CS229在线课程公告,涵盖构建大型语言模型、深度神经网络、TensorFlow、Keras、OpenCV以及使用spaCy的自然语言处理。
@jiqizhixin:太棒了!关于推理型LLM的强化学习现状 https://aweers.de/blog/2026/rl-for-llms/…
一篇全面回顾推理型LLM强化学习现状的博文,涵盖从REINFORCE、PPO到GRPO乃至更多方法,并与InstructGPT、DeepSeek-R1等关键模型相联系。
@tan_maty: 吹爆这个课,计算机专业必看 CS336, 这是一门在 AI 圈子里最近封神的课程。 语言模型从零构建大语言 这门课由 Stanford 开设,授课老师是 NLP 领域的顶尖大佬 Percy Liang 和 Tatsunori Hashim…
A thread promoting Stanford's CS336 course on building language models from scratch, taught by NLP experts Percy Liang and Tatsunori Hashimoto, emphasizing hands-on understanding.