@NFTCPS: Want to master Reinforcement Learning? Keep dreaming, bro. Online courses just teach you how to call APIs, leaving you utterly confused after finishing. Reading papers? Mountains of formulas will scare you off instantly. Trying to systematically understand the principles? The barrier to entry feels like climbing to heaven, and the learning path is as tangled as a maze. Recently, I stumbled upon an open-source book, 'Mathematical Foundations of Reinforcement Learning,' that pierces right through this fog. It provides a crystal-clear roadmap: starting from mathematics…
Summary
Introduces an open-source book, 'Mathematical Foundations of Reinforcement Learning,' which offers a rigorous yet accessible mathematical approach to RL, using grid world examples to clarify algorithmic logic.
View Cached Full Text
Cached at: 05/10/26, 06:23 AM
Want to truly master Reinforcement Learning? Dream on, bro. Online courses just teach you how to import libraries, leaving you completely clueless once they’re over. Reading papers? Mountains of formulas that instantly turn you off. Trying to systematically understand the principles? The barrier to entry feels like climbing to the sky, and the learning path is as confusing as a maze.
Recently, I discovered an open-source book, Mathematical Foundations of Reinforcement Learning, that finally breaks through this barrier. It provides a crystal-clear roadmap: starting from mathematics, it breaks down the core logic of RL and feeds it straight to you. The entire book revolves around one classic “Grid World” case study, with algorithm derivations explained step-by-step—no skipping steps, no detours. The mathematical depth is perfectly balanced: rigorous where it needs to be, simplified where appropriate, with the goal of ensuring you truly digest every concept. It’s beginner-friendly yet suitable for AI developers looking to solidify their foundation. Combine it with the lecture videos, and your efficiency will double.
Link here: GitHub: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning…
MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning
Source: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning
About the Latex source code of my slides
If you are a professor preparing a course and would like to use any content from my slides, feel free to reach out by email. I can share the source code with you. The slides were created using Latex/Beamer.
Regarding reader feedback and questions in the discussion section, please note that due to a high volume of commitments, there may be significant delays in my response. Your understanding would be greatly appreciated.
Why a new book on reinforcement learning?
This book aims to provide a mathematical but friendly introduction to the fundamental concepts, basic problems, and classic algorithms in reinforcement learning. Some essential features of this book are highlighted as follows.
-
The book introduces reinforcement learning from a mathematical point of view. Hopefully, readers will not only know the procedure of an algorithm but also understand why it was designed in the first place and why it works effectively.
-
The depth of the mathematics is carefully controlled to an adequate level. The mathematics is also presented in a carefully designed manner to ensure that the book is friendly to read. Readers can selectively read the materials presented in gray boxes according to their interests.
-
Many illustrative examples are given to help readers better understand the topics. All the examples in this book are based on a grid world task, which is easy to understand and helpful for illustrating concepts and algorithms.
-
When introducing an algorithm, the book aims to separate its core idea from complications that may be distracting. In this way, readers can better grasp the core idea of an algorithm.
-
The contents of the book are coherently organized. Each chapter is built based on the preceding chapter and lays a necessary foundation for the subsequent one.
Book cover (https://link.springer.com/book/9789819739431)
Contents
The topics addressed in the book are shown in the figure below. This book contains ten chapters, which can be classified into two parts: the first part is about basic tools, and the second part is about algorithms. The ten chapters are highly correlated. In general, it is necessary to study the earlier chapters first before the later ones.
The map of this book
Readership
This book is designed for senior undergraduate students, graduate students, researchers, and practitioners interested in reinforcement learning.
It does not require readers to have any background in reinforcement learning because it starts by introducing the most basic concepts. If the reader already has some background in reinforcement learning, I believe the book can help them understand some topics more deeply or provide different perspectives.
This book, however, requires the reader to have some knowledge of probability theory and linear algebra. Some basics of the required mathematics are also included in the appendix of this book.
About the author
You can find my info on my homepage https://www.shiyuzhao.net (GoogleSite) and my research group website https://shiyuzhao.westlake.edu.cn
I have been teaching a graduate-level course on reinforcement learning since 2019. Along with teaching, I have been preparing this book as the lecture notes for my students.
I sincerely hope this book can help readers smoothly enter the exciting field of reinforcement learning.
Citation
@book{zhao2025RLBook, title={Mathematical Foundations of Reinforcement Learning}, author={S. Zhao}, year={2025}, publisher={Springer Press} }
Lecture videos
The lecture videos have received 2,100,000+ views over the Internet and received very good feedback! By combining the book with my lecture videos, I believe you can study better.
- Chinese lecture videos: You can check the Bilibili channel (https://space.bilibili.com/2044042934) or the Youtube channel (https://www.youtube.com/channel/UCztGtS5YYiNv8x3pj9hLVgg/playlists).
- English lecture videos: The English lecture videos have been uploaded to YouTube: link here (https://youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=D1T4pcyHsMxj6CzB)
- Overview of Reinforcement Learning in 30 Minutes (https://www.youtube.com/watch?v=ZHMWHr9811U&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=1)
- L1: Basic Concepts (P1-State, action, policy, …) (https://www.youtube.com/watch?v=zJHtM5dN69g&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=2)
- L1: Basic Concepts (P2-Reward,return, Markov decision process) (https://www.youtube.com/watch?v=repVl3_GYCI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=3)
- L2: Bellman Equation (P1-Motivating examples) (https://www.youtube.com/watch?v=XCzWrlgZCwc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=4)
- L2: Bellman Equation (P2-State value) (https://www.youtube.com/watch?v=DSvi3xEN13I&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=5)
- L2: Bellman Equation (P3-Bellman equation-Derivation) (https://www.youtube.com/watch?v=eNtId8yPWkA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=6)
- L2: Bellman Equation (P4-Matrix-vector form and solution) (https://www.youtube.com/watch?v=EtCfBG_eP2w&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=7)
- L2: Bellman Equation (P5-Action value) (https://www.youtube.com/watch?v=zJo2sLDzfcU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=8)
- L3: Bellman Optimality Equation (P1-Motivating example) (https://www.youtube.com/watch?v=lXKY_Hyg4SQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=9)
- L3: Bellman Optimality Equation (P2-Optimal policy) (https://www.youtube.com/watch?v=BxyjdHhK8a8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=10)
- L3: Bellman Optimality Equation (P3-More on BOE) (https://www.youtube.com/watch?v=FXftTCKotC8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=11)
- L3: Bellman Optimality Equation (P4-Interesting properties) (https://www.youtube.com/watch?v=a–bck2ow9s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=12)
- L4: Value Iteration and Policy Iteration (P1-Value iteration) (https://www.youtube.com/watch?v=wMAVmLDIvQU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=13)
- L4: Value Iteration and Policy Iteration (P2-Policy iteration) (https://www.youtube.com/watch?v=Pka6Om0nYQ8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=14)
- L4: Value Iteration and Policy Iteration (P3-Truncated policy iteration) (https://www.youtube.com/watch?v=tUjPFPD3Vc8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=15)
- L5: Monte Carlo Learning (P1-Motivating examples) (https://www.youtube.com/watch?v=DO1yXinAV_Q&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=16)
- L5: Monte Carlo Learning (P2-MC Basic-introduction) (https://www.youtube.com/watch?v=6ShisunU0zs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=17)
- L5: Monte Carlo Learning (P3-MC Basic-examples) (https://www.youtube.com/watch?v=axA0yns9FxU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=18)
- L5: Monte Carlo Learning (P4-MC Exploring Starts) (https://www.youtube.com/watch?v=Qt8OMHPkLqg&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=19)
- L5: Monte Carlo Learning (P5-MC Epsilon-Greedy-introduction) (https://www.youtube.com/watch?v=dM3fYE630pY&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=20)
- L5: Monte Carlo Learning (P6-MC Epsilon-Greedy-examples) (https://www.youtube.com/watch?v=x6X_5ePT9gQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=21)
- L6: Stochastic Approximation and SGD (P1-Motivating example) (https://www.youtube.com/watch?v=1bMgejvWoAo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=22)
- L6: Stochastic Approximation and SGD (P2-RM algorithm: introduction) (https://www.youtube.com/watch?v=1FTGcNUUnCE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=23)
- L6: Stochastic Approximation and SGD (P3-RM algorithm: convergence) (https://www.youtube.com/watch?v=juNDoAFEre4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=24)
- L6: Stochastic Approximation and SGD (P4-SGD algorithm: introduction) (https://www.youtube.com/watch?v=EZO7Iadp5m4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=25)
- L6: Stochastic Approximation and SGD (P5-SGD algorithm: examples) (https://www.youtube.com/watch?v=BsxU_4qvvNA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=26)
- L6: Stochastic Approximation and SGD (P6-SGD algorithm: properties) (https://www.youtube.com/watch?v=fWxX9YuEHjE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=27)
- L6: Stochastic Approximation and SGD (P7-SGD algorithm: comparison) (https://www.youtube.com/watch?v=yNEV2cLKuzU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=28)
- L7: Temporal-Difference Learning (P1-Motivating example) (https://www.youtube.com/watch?v=u1X-7XX3dtI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=29)
- L7: Temporal-Difference Learning (P2-TD algorithm: introduction) (https://www.youtube.com/watch?v=XiCUsc7CCE0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=30)
- L7: Temporal-Difference Learning (P3-TD algorithm: convergence) (https://www.youtube.com/watch?v=faWg8M91-Oo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=31)
- L7: Temporal-Difference Learning (P4-Sarsa) (https://www.youtube.com/watch?v=jYwQufkBUPo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=32)
- L7: Temporal-Difference Learning (P5-Expected Sarsa & n-step Sarsa) (https://www.youtube.com/watch?v=0kKzQbWZOlk&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=33)
- L7: Temporal-Difference Learning (P6-Q-learning: introduction) (https://www.youtube.com/watch?v=4BvYR2hm730&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=34)
- L7: Temporal-Difference Learning (P7-Q-learning: pseudo code) (https://www.youtube.com/watch?v=I0YhlOIFF4s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=35)
- L7: Temporal-Difference Learning (P8-Unified viewpoint and summary) (https://www.youtube.com/watch?v=3t74lvk1GBM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=36)
- L8: Value Function Approximation (P1-Motivating example–curve fitting) (https://www.youtube.com/watch?v=uJXcI8fcdWc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=37)
- L8: Value Function Approximation (P2-Objective function) (https://www.youtube.com/watch?v=Z3HI1TfpJP0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=38)
- L8: Value Function Approximation (P3-Optimization algorithm) (https://www.youtube.com/watch?v=piBDwrKt0uU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=39)
- L8: Value Function Approximation (P4-illustrative examples and analysis) (https://www.youtube.com/watch?v=VFyBNEZxMMs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=40)
- L8: Value Function Approximation (P5-Sarsa and Q-learning) (https://www.youtube.com/watch?v=C-HtY4-W_zw&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=41)
- L8: Value Function Approximation (P6-DQN–basic idea) (https://www.youtube.com/watch?v=lZCcbZbqVSQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=42)
- L8: Value Function Approximation (P7-DQN–experience replay) (https://www.youtube.com/watch?v=rynEdAdebi0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=43)
- L8: Value Function Approximation (P8-DQN–implementation and example) (https://www.youtube.com/watch?v=vQHuCHjd6hA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=44)
- L9: Policy Gradient Methods (P1-Basic idea) (https://www.youtube.com/watch?v=mtFHOj83QSo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=45)
- L9: Policy Gradient Methods (P2-Metric 1–Average value) (https://www.youtube.com/watch?v=la8jQc3hX1M&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=46)
- L9: Policy Gradient Methods (P3-Metric 2–Average reward) (https://www.youtube.com/watch?v=8RZ_rQFe69E&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=47)
- L9: Policy Gradient Methods (P4-Gradients of the metrics) (https://www.youtube.com/watch?v=MvmtPXur3Ls&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=48)
- L9: Policy Gradient Methods (P5-Gradient-based algorithms & REINFORCE) (https://www.youtube.com/watch?v=1DQnnUC8ng8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=49)
- L10: Actor-Critic Methods (P1-The simplest Actor-Critic) (https://www.youtube.com/watch?v=kjCZAT5Wh80&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=50)
- L10: Actor-Critic Methods (P2-Advantage Actor-Critic) (https://www.youtube.com/watch?v=vZVXJJcZNEM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=51)
- L10: Actor-Critic Methods (P3-Importance sampling & off-policy Actor-Critic) (https://www.youtube.com/watch?v=TfO5mnsiGKc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=52)
- L10: Actor-Critic Methods (P4-Deterministic Actor-Critic) (https://www.youtube.com/watch?v=dTjz1RNtic4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=53)
- L10: Actor-Critic Methods (P5-Summary and goodbye!) (https://www.youtube.com/watch?v=npvnnKcXoBs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=54)
Some comments from YouTube and Amazon:
Third-party code and materials
Many enthusiastic readers sent me the source code or notes that they developed when they studied this book. If you create any materials based on course, you are welcome to write an email. I am happy to share the links here and hope they may be helpful to other readers. I must emphasize that I have not verified the code. If you have any questions, you can directly contact the developers.
Code
Python:
-
https://github.com/AstonDky/Math_in_RL_Visual (May 2026, by Keyan Dong)
-
https://github.com/Ronchy2000/Multi-agent-RL/tree/master/RL_Learning-main (Oct 2025, by Rongqi Lu)
-
https://github.com/zhoubay/Code-for-Mathematical-Foundations-of-Reinforcement-Learning (Mar 2025, by Xibin ZHOU)
-
https://github.com/10-OASIS-01/minrl (Feb 2025)
-
https://github.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning (by Zehong Cao, Aug 2024)
-
https://github.com/ziwenhahaha/Code-of-RL-Beginning by RLGamer (Mar 2024)
- Videos for code explanation: https://www.bilibili.com/video/BV1fW421w7NH
-
https://github.com/jwk1rose/RL_Learning by Wenkang Ji (Feb 2024)
Matlab:
- https://github.com/EveryDayIsaSong/MATLAB-Code-for-Mathematical-Foundation-of-Reinforcement-Learning (by Yucheng Mao, Jan 2026)
R:
- https://github.com/NewbieToEverything/Code-Mathmatical-Foundation-of-Reinforcement-Learning
C++:
- https://github.com/purundong/test_rl
Study notes
English:
- https://lyk-love.cn/tags/reinforcement-learning/ by a graduate student from UC Davis
Chinese:
-
https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main (Jan 2026)
-
https://zhuanlan.zhihu.com/p/692207843
-
https://blog.csdn.net/qq_64671439/category_12540921.html
-
http://t.csdnimg.cn/EH4rj
-
https://blog.csdn.net/LvGreat/article/details/135454738
-
https://xinzhe.blog.csdn.net/article/details/129452000
-
https://blog.csdn.net/v20000727/article/details/136870879?spm=1001.2014.3001.5502
-
https://blog.csdn.net/m0_64952374/category_12883361.html
There are also many others notes made by many other readers on the Internet. I am not able to put them all here. You are welcome to recommend to me if you find a good one.
Bilibili videos made based on my course
-
https://www.bilibili.com/video/BV1DMBYB6Edo (Jan 2026)
-
https://www.bilibili.com/video/BV1fW421w7NH
-
https://www.bilibili.com/video/BV1Ne411m7GX
-
https://www.bilibili.com/video/BV1HX4y1H7uR
-
https://www.bilibili.com/video/BV1TgzsYDEnP
-
https://www.bilibili.com/video/BV1CQ4y1J7zu
Update
Similar Articles
@NFTCPS: If you work in AI, take this UCLA course! Theory + practice: a deep dive into RL and LLM training from scratch. Covers MDP, PPO algorithms, the full RLHF process, and hands-on Jupyter coding. Taught by a UCLA professor with videos and assignments, ready to apply immediately after completion. Course URL: https://ernestryu.com/courses/RL-LLM.html…
This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.
@NFTCPS: Brothers, doing AI without large models is like doing nothing! Today I have to recommend an open-source masterpiece 'Foundations of LLMs' to you. Don't wait, just read it! This book doesn't beat around the bush—it goes deep from the start! From getting started with large language models to architectural evolution, and then it breaks down Prompt engineering, parameter-efficient fine-tuning, model editing, RAG (Retrieval-Augmented Generation) and other hardcore techniques in one go—a one-stop service.
This article promotes the open-source book 'Foundations of LLMs', which systematically explains knowledge about large language models, and introduces the multi-agent development framework Agent-Kernel.
@wsl8297: Sharing an easy-to-read open-source book 'Foundations of Large Models'. From an introduction to large language models to architectural evolution, then to key technologies such as Prompt engineering, parameter-efficient fine-tuning, model editing, retrieval-augmented generation (RAG), all in one book. GitHub: https://github.com/ZJU-LLMs/…
The Zhejiang University team open-sourced an easy-to-understand textbook on large models 'Foundations of Large Models', covering from architectural evolution to key technologies like RAG, accompanied by the Agent-Kernel multi-agent framework.
@ekzhu: I read the RLM paper and it’s like, this is the simplest way to solve a general problem, seriously it’s just this simple.
A researcher comments on the simplicity and elegance of the RLM paper, comparing it to the influential ReAct paper and expressing appreciation for its straightforward approach to solving general problems.
@wsl8297: UC's Open Course on Reinforcement Learning for LLMs uses a 'theory + practice' approach to thoroughly explain key AI training techniques from the ground up, helping you systematically build a complete framework spanning from RL to LLM training. Comprehensive curriculum paired with complete resources: lecture slides, full videos, and practical exercises are all provided so you can start implementing right away…
Assistant Professor Ernest K. Ryu at UCLA offers the open course 'Reinforcement Learning for Large Language Models,' comprehensively analyzing key LLM training techniques like RLHF, PPO, and DPO alongside their supporting resources through a blend of theory and practice. The course provides developers and researchers with a systematic learning path from foundational algorithms to practical deployment.