@NFTCPS: Want to master Reinforcement Learning? Keep dreaming, bro. Online courses just teach you how to call APIs, leaving you utterly confused after finishing. Reading papers? Mountains of formulas will scare you off instantly. Trying to systematically understand the principles? The barrier to entry feels like climbing to heaven, and the learning path is as tangled as a maze. Recently, I stumbled upon an open-source book, 'Mathematical Foundations of Reinforcement Learning,' that pierces right through this fog. It provides a crystal-clear roadmap: starting from mathematics…

X AI KOLs Timeline 05/10/26, 03:16 AM Papers

reinforcement-learning mathematics open-source educational-resources book springer

Summary

Introduces an open-source book, 'Mathematical Foundations of Reinforcement Learning,' which offers a rigorous yet accessible mathematical approach to RL, using grid world examples to clarify algorithmic logic.

Want to truly master Reinforcement Learning? Keep dreaming, bro. Online courses just teach you how to call libraries/APIs, leaving you utterly confused once you’re done. Reading papers? Mountains of formulas will scare you off instantly. Trying to systematically grasp the principles? The barrier to entry feels like climbing to heaven, and the learning path is as tangled as a maze. Recently, I discovered an open-source book, 'Mathematical Foundations of Reinforcement Learning,' that pierces right through this fog. It provides a crystal-clear roadmap: starting from mathematics, it breaks down the core logic of RL and feeds it right to you. The entire book is anchored by one classic 'Grid World' example running throughout. Algorithm derivations are explained step-by-step, without skipping steps or taking unnecessary detours. The mathematical depth is perfectly calibrated—rigorous where it needs to be, simplified where appropriate, with the goal of helping you truly internalize every concept. It’s beginner-friendly and also suitable for AI developers looking to solidify their foundations. Watch it alongside the videos, and your efficiency will double. Link here: GitHub: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning…

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/10/26, 06:23 AM

Want to truly master Reinforcement Learning? Dream on, bro. Online courses just teach you how to import libraries, leaving you completely clueless once they’re over. Reading papers? Mountains of formulas that instantly turn you off. Trying to systematically understand the principles? The barrier to entry feels like climbing to the sky, and the learning path is as confusing as a maze.

Recently, I discovered an open-source book, Mathematical Foundations of Reinforcement Learning, that finally breaks through this barrier. It provides a crystal-clear roadmap: starting from mathematics, it breaks down the core logic of RL and feeds it straight to you. The entire book revolves around one classic “Grid World” case study, with algorithm derivations explained step-by-step—no skipping steps, no detours. The mathematical depth is perfectly balanced: rigorous where it needs to be, simplified where appropriate, with the goal of ensuring you truly digest every concept. It’s beginner-friendly yet suitable for AI developers looking to solidify their foundation. Combine it with the lecture videos, and your efficiency will double.

Link here: GitHub: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning…

MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

Source: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

About the Latex source code of my slides

If you are a professor preparing a course and would like to use any content from my slides, feel free to reach out by email. I can share the source code with you. The slides were created using Latex/Beamer.

Regarding reader feedback and questions in the discussion section, please note that due to a high volume of commitments, there may be significant delays in my response. Your understanding would be greatly appreciated.

Why a new book on reinforcement learning?

This book aims to provide a mathematical but friendly introduction to the fundamental concepts, basic problems, and classic algorithms in reinforcement learning. Some essential features of this book are highlighted as follows.

The book introduces reinforcement learning from a mathematical point of view. Hopefully, readers will not only know the procedure of an algorithm but also understand why it was designed in the first place and why it works effectively.
The depth of the mathematics is carefully controlled to an adequate level. The mathematics is also presented in a carefully designed manner to ensure that the book is friendly to read. Readers can selectively read the materials presented in gray boxes according to their interests.
Many illustrative examples are given to help readers better understand the topics. All the examples in this book are based on a grid world task, which is easy to understand and helpful for illustrating concepts and algorithms.
When introducing an algorithm, the book aims to separate its core idea from complications that may be distracting. In this way, readers can better grasp the core idea of an algorithm.
The contents of the book are coherently organized. Each chapter is built based on the preceding chapter and lays a necessary foundation for the subsequent one.

Book cover (https://link.springer.com/book/9789819739431)

The topics addressed in the book are shown in the figure below. This book contains ten chapters, which can be classified into two parts: the first part is about basic tools, and the second part is about algorithms. The ten chapters are highly correlated. In general, it is necessary to study the earlier chapters first before the later ones.

The map of this book

Readership

This book is designed for senior undergraduate students, graduate students, researchers, and practitioners interested in reinforcement learning.

It does not require readers to have any background in reinforcement learning because it starts by introducing the most basic concepts. If the reader already has some background in reinforcement learning, I believe the book can help them understand some topics more deeply or provide different perspectives.

This book, however, requires the reader to have some knowledge of probability theory and linear algebra. Some basics of the required mathematics are also included in the appendix of this book.

About the author

You can find my info on my homepage https://www.shiyuzhao.net (GoogleSite) and my research group website https://shiyuzhao.westlake.edu.cn

I have been teaching a graduate-level course on reinforcement learning since 2019. Along with teaching, I have been preparing this book as the lecture notes for my students.

I sincerely hope this book can help readers smoothly enter the exciting field of reinforcement learning.

Citation

@book{zhao2025RLBook, title={Mathematical Foundations of Reinforcement Learning}, author={S. Zhao}, year={2025}, publisher={Springer Press} }

Lecture videos

The lecture videos have received 2,100,000+ views over the Internet and received very good feedback! By combining the book with my lecture videos, I believe you can study better.

Chinese lecture videos: You can check the Bilibili channel (https://space.bilibili.com/2044042934) or the Youtube channel (https://www.youtube.com/channel/UCztGtS5YYiNv8x3pj9hLVgg/playlists).
English lecture videos: The English lecture videos have been uploaded to YouTube: link here (https://youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=D1T4pcyHsMxj6CzB)

Overview of Reinforcement Learning in 30 Minutes (https://www.youtube.com/watch?v=ZHMWHr9811U&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=1)
L1: Basic Concepts (P1-State, action, policy, …) (https://www.youtube.com/watch?v=zJHtM5dN69g&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=2)
L1: Basic Concepts (P2-Reward,return, Markov decision process) (https://www.youtube.com/watch?v=repVl3_GYCI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=3)
L2: Bellman Equation (P1-Motivating examples) (https://www.youtube.com/watch?v=XCzWrlgZCwc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=4)
L2: Bellman Equation (P2-State value) (https://www.youtube.com/watch?v=DSvi3xEN13I&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=5)
L2: Bellman Equation (P3-Bellman equation-Derivation) (https://www.youtube.com/watch?v=eNtId8yPWkA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=6)
L2: Bellman Equation (P4-Matrix-vector form and solution) (https://www.youtube.com/watch?v=EtCfBG_eP2w&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=7)
L2: Bellman Equation (P5-Action value) (https://www.youtube.com/watch?v=zJo2sLDzfcU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=8)
L3: Bellman Optimality Equation (P1-Motivating example) (https://www.youtube.com/watch?v=lXKY_Hyg4SQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=9)
L3: Bellman Optimality Equation (P2-Optimal policy) (https://www.youtube.com/watch?v=BxyjdHhK8a8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=10)
L3: Bellman Optimality Equation (P3-More on BOE) (https://www.youtube.com/watch?v=FXftTCKotC8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=11)
L3: Bellman Optimality Equation (P4-Interesting properties) (https://www.youtube.com/watch?v=a–bck2ow9s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=12)
L4: Value Iteration and Policy Iteration (P1-Value iteration) (https://www.youtube.com/watch?v=wMAVmLDIvQU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=13)
L4: Value Iteration and Policy Iteration (P2-Policy iteration) (https://www.youtube.com/watch?v=Pka6Om0nYQ8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=14)
L4: Value Iteration and Policy Iteration (P3-Truncated policy iteration) (https://www.youtube.com/watch?v=tUjPFPD3Vc8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=15)
L5: Monte Carlo Learning (P1-Motivating examples) (https://www.youtube.com/watch?v=DO1yXinAV_Q&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=16)
L5: Monte Carlo Learning (P2-MC Basic-introduction) (https://www.youtube.com/watch?v=6ShisunU0zs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=17)
L5: Monte Carlo Learning (P3-MC Basic-examples) (https://www.youtube.com/watch?v=axA0yns9FxU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=18)
L5: Monte Carlo Learning (P4-MC Exploring Starts) (https://www.youtube.com/watch?v=Qt8OMHPkLqg&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=19)
L5: Monte Carlo Learning (P5-MC Epsilon-Greedy-introduction) (https://www.youtube.com/watch?v=dM3fYE630pY&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=20)
L5: Monte Carlo Learning (P6-MC Epsilon-Greedy-examples) (https://www.youtube.com/watch?v=x6X_5ePT9gQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=21)
L6: Stochastic Approximation and SGD (P1-Motivating example) (https://www.youtube.com/watch?v=1bMgejvWoAo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=22)
L6: Stochastic Approximation and SGD (P2-RM algorithm: introduction) (https://www.youtube.com/watch?v=1FTGcNUUnCE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=23)
L6: Stochastic Approximation and SGD (P3-RM algorithm: convergence) (https://www.youtube.com/watch?v=juNDoAFEre4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=24)
L6: Stochastic Approximation and SGD (P4-SGD algorithm: introduction) (https://www.youtube.com/watch?v=EZO7Iadp5m4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=25)
L6: Stochastic Approximation and SGD (P5-SGD algorithm: examples) (https://www.youtube.com/watch?v=BsxU_4qvvNA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=26)
L6: Stochastic Approximation and SGD (P6-SGD algorithm: properties) (https://www.youtube.com/watch?v=fWxX9YuEHjE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=27)
L6: Stochastic Approximation and SGD (P7-SGD algorithm: comparison) (https://www.youtube.com/watch?v=yNEV2cLKuzU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=28)
L7: Temporal-Difference Learning (P1-Motivating example) (https://www.youtube.com/watch?v=u1X-7XX3dtI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=29)
L7: Temporal-Difference Learning (P2-TD algorithm: introduction) (https://www.youtube.com/watch?v=XiCUsc7CCE0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=30)
L7: Temporal-Difference Learning (P3-TD algorithm: convergence) (https://www.youtube.com/watch?v=faWg8M91-Oo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=31)
L7: Temporal-Difference Learning (P4-Sarsa) (https://www.youtube.com/watch?v=jYwQufkBUPo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=32)
L7: Temporal-Difference Learning (P5-Expected Sarsa & n-step Sarsa) (https://www.youtube.com/watch?v=0kKzQbWZOlk&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=33)
L7: Temporal-Difference Learning (P6-Q-learning: introduction) (https://www.youtube.com/watch?v=4BvYR2hm730&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=34)
L7: Temporal-Difference Learning (P7-Q-learning: pseudo code) (https://www.youtube.com/watch?v=I0YhlOIFF4s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=35)
L7: Temporal-Difference Learning (P8-Unified viewpoint and summary) (https://www.youtube.com/watch?v=3t74lvk1GBM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=36)
L8: Value Function Approximation (P1-Motivating example–curve fitting) (https://www.youtube.com/watch?v=uJXcI8fcdWc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=37)
L8: Value Function Approximation (P2-Objective function) (https://www.youtube.com/watch?v=Z3HI1TfpJP0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=38)
L8: Value Function Approximation (P3-Optimization algorithm) (https://www.youtube.com/watch?v=piBDwrKt0uU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=39)
L8: Value Function Approximation (P4-illustrative examples and analysis) (https://www.youtube.com/watch?v=VFyBNEZxMMs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=40)
L8: Value Function Approximation (P5-Sarsa and Q-learning) (https://www.youtube.com/watch?v=C-HtY4-W_zw&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=41)
L8: Value Function Approximation (P6-DQN–basic idea) (https://www.youtube.com/watch?v=lZCcbZbqVSQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=42)
L8: Value Function Approximation (P7-DQN–experience replay) (https://www.youtube.com/watch?v=rynEdAdebi0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=43)
L8: Value Function Approximation (P8-DQN–implementation and example) (https://www.youtube.com/watch?v=vQHuCHjd6hA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=44)
L9: Policy Gradient Methods (P1-Basic idea) (https://www.youtube.com/watch?v=mtFHOj83QSo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=45)
L9: Policy Gradient Methods (P2-Metric 1–Average value) (https://www.youtube.com/watch?v=la8jQc3hX1M&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=46)
L9: Policy Gradient Methods (P3-Metric 2–Average reward) (https://www.youtube.com/watch?v=8RZ_rQFe69E&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=47)
L9: Policy Gradient Methods (P4-Gradients of the metrics) (https://www.youtube.com/watch?v=MvmtPXur3Ls&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=48)
L9: Policy Gradient Methods (P5-Gradient-based algorithms & REINFORCE) (https://www.youtube.com/watch?v=1DQnnUC8ng8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=49)
L10: Actor-Critic Methods (P1-The simplest Actor-Critic) (https://www.youtube.com/watch?v=kjCZAT5Wh80&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=50)
L10: Actor-Critic Methods (P2-Advantage Actor-Critic) (https://www.youtube.com/watch?v=vZVXJJcZNEM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=51)
L10: Actor-Critic Methods (P3-Importance sampling & off-policy Actor-Critic) (https://www.youtube.com/watch?v=TfO5mnsiGKc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=52)
L10: Actor-Critic Methods (P4-Deterministic Actor-Critic) (https://www.youtube.com/watch?v=dTjz1RNtic4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=53)
L10: Actor-Critic Methods (P5-Summary and goodbye!) (https://www.youtube.com/watch?v=npvnnKcXoBs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=54)

Some comments from YouTube and Amazon:

Third-party code and materials

Many enthusiastic readers sent me the source code or notes that they developed when they studied this book. If you create any materials based on course, you are welcome to write an email. I am happy to share the links here and hope they may be helpful to other readers. I must emphasize that I have not verified the code. If you have any questions, you can directly contact the developers.

Code

Python:

https://github.com/AstonDky/Math_in_RL_Visual (May 2026, by Keyan Dong)
https://github.com/Ronchy2000/Multi-agent-RL/tree/master/RL_Learning-main (Oct 2025, by Rongqi Lu)
https://github.com/zhoubay/Code-for-Mathematical-Foundations-of-Reinforcement-Learning (Mar 2025, by Xibin ZHOU)
https://github.com/10-OASIS-01/minrl (Feb 2025)
https://github.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning (by Zehong Cao, Aug 2024)
https://github.com/ziwenhahaha/Code-of-RL-Beginning by RLGamer (Mar 2024)
- Videos for code explanation: https://www.bilibili.com/video/BV1fW421w7NH
https://github.com/jwk1rose/RL_Learning by Wenkang Ji (Feb 2024)

Matlab:

https://github.com/EveryDayIsaSong/MATLAB-Code-for-Mathematical-Foundation-of-Reinforcement-Learning (by Yucheng Mao, Jan 2026)

https://github.com/NewbieToEverything/Code-Mathmatical-Foundation-of-Reinforcement-Learning

C++:

https://github.com/purundong/test_rl

Study notes

English:

https://lyk-love.cn/tags/reinforcement-learning/ by a graduate student from UC Davis

Chinese:

https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main (Jan 2026)
https://zhuanlan.zhihu.com/p/692207843
https://blog.csdn.net/qq_64671439/category_12540921.html
http://t.csdnimg.cn/EH4rj
https://blog.csdn.net/LvGreat/article/details/135454738
https://xinzhe.blog.csdn.net/article/details/129452000
https://blog.csdn.net/v20000727/article/details/136870879?spm=1001.2014.3001.5502
https://blog.csdn.net/m0_64952374/category_12883361.html

There are also many others notes made by many other readers on the Internet. I am not able to put them all here. You are welcome to recommend to me if you find a good one.

Bilibili videos made based on my course

https://www.bilibili.com/video/BV1DMBYB6Edo (Jan 2026)
https://www.bilibili.com/video/BV1fW421w7NH
https://www.bilibili.com/video/BV1Ne411m7GX
https://www.bilibili.com/video/BV1HX4y1H7uR
https://www.bilibili.com/video/BV1TgzsYDEnP
https://www.bilibili.com/video/BV1CQ4y1J7zu

MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

About the Latex source code of my slides

Why a new book on reinforcement learning?

Contents

Readership

About the author

Citation

Lecture videos

Third-party code and materials

Update

Similar Articles

@ekzhu: I read the RLM paper and it’s like, this is the simplest way to solve a general problem, seriously it’s just this simple.

Submit Feedback

Similar Articles

@NFTCPS: If you work in AI, take this UCLA course! Theory + practice: a deep dive into RL and LLM training from scratch. Covers MDP, PPO algorithms, the full RLHF process, and hands-on Jupyter coding. Taught by a UCLA professor with videos and assignments, ready to apply immediately after completion. Course URL: https://ernestryu.com/courses/RL-LLM.html…

@NFTCPS: Brothers, doing AI without large models is like doing nothing! Today I have to recommend an open-source masterpiece 'Foundations of LLMs' to you. Don't wait, just read it! This book doesn't beat around the bush—it goes deep from the start! From getting started with large language models to architectural evolution, and then it breaks down Prompt engineering, parameter-efficient fine-tuning, model editing, RAG (Retrieval-Augmented Generation) and other hardcore techniques in one go—a one-stop service.

@wsl8297: Sharing an easy-to-read open-source book 'Foundations of Large Models'. From an introduction to large language models to architectural evolution, then to key technologies such as Prompt engineering, parameter-efficient fine-tuning, model editing, retrieval-augmented generation (RAG), all in one book. GitHub: https://github.com/ZJU-LLMs/…

@ekzhu: I read the RLM paper and it’s like, this is the simplest way to solve a general problem, seriously it’s just this simple.

@wsl8297: UC's Open Course on Reinforcement Learning for LLMs uses a 'theory + practice' approach to thoroughly explain key AI training techniques from the ground up, helping you systematically build a complete framework spanning from RL to LLM training. Comprehensive curriculum paired with complete resources: lecture slides, full videos, and practical exercises are all provided so you can start implementing right away…