@Jolyne_AI: To systematically study reinforcement learning, the most discouraging materials are often of two types: one only talks about concepts, leaving you unable to implement after finishing; the other is filled with formulas on every page, making it impossible to read through two chapters. The open-source textbook 'Mathematical Foundations of Reinforcement Learning' hits the sweet spot: it explains clearly, with rigorous but not intimidating derivations, and comes with numerous videos that thoroughly cover classic algorithms from definition to implementation...

X AI KOLs Timeline Tools

Summary

Introducing the open-source textbook 'Mathematical Foundations of Reinforcement Learning', which explains reinforcement learning in a simple yet mathematically rigorous manner. It comes with extensive videos and code implementations, suitable for learners with a basic background in probability theory and linear algebra.

To systematically study reinforcement learning, the most discouraging materials are often of two types: one only talks about concepts, leaving you still unable to implement after finishing; the other is packed with formulas on every page, making it impossible to get through two chapters. The open-source textbook 'Mathematical Foundations of Reinforcement Learning' fits right in between: it explains clearly, with rigorous but not intimidating derivations, and comes with extensive video lectures that walk through classic algorithms from definition to implementation step by step. It uses a mathematical perspective to organize the core framework of RL smoothly, and employs numerous examples to ground abstract concepts, making those seemingly mystical update formulas understandable, reproducible, and implementable. GitHub: http://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning… What you will gain: - Core concepts fully connected: states, actions, policies, value functions, etc., built into a coherent system - Classic algorithms broken down to details: MC / TD / Q-learning unfolded sequentially from principles to derivations - 50+ Chinese-English bilingual video lectures: watch and practice simultaneously for aligned understanding and implementation - Numerous grid world examples: use intuitive experiments to align formulas, update rules, and results - Derivations are rigorous but concise: minimal skipped steps, no hand-waving, difficulty well-controlled - Multi-language code implementations: Python, R, C++, directly accessible for reproduction Ideal for those who want to truly solidify RL theory and apply it in practice; it is recommended to have basic probability theory and linear algebra before starting.
Original Article
View Cached Full Text

Cached at: 06/29/26, 04:23 AM

If you want to systematically learn reinforcement learning, the most discouraging thing is often two types of materials: one that only covers concepts, leaving you unable to actually do anything after studying; the other is packed with formulas on every page, making it hard to get through even two chapters.

The open-source textbook Mathematical Foundations of Reinforcement Learning hits the sweet spot: clear explanations, rigorous but not intimidating derivations, and a wealth of accompanying videos that walk through classic algorithms step by step from definition to implementation.

It uses a mathematical perspective to neatly organize the core framework of RL, with plenty of examples to ground abstract concepts, turning those “mysterious-looking” update formulas into something you can understand, reproduce, and code.

GitHub: http://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning…

What you’ll gain:

  • Core concepts fully connected: state, action, policy, value function, etc., integrated into a coherent system
  • Classic algorithms broken down to the details: MC / TD / Q-learning, from principle to derivation, unfolded step by step
  • 50+ Chinese/English video lectures: watch and code simultaneously, understanding and implementation in sync
  • Abundant grid-world examples: align formulas, update rules, and results with intuitive experiments
  • Derivations are rigorous but not convoluted: few skipped steps, no hand-waving, difficulty well-balanced
  • Multi-language code implementations: Python, R, C++, etc., directly usable for reproduction

Ideal for those who want to truly solidify their theoretical understanding of RL and be able to implement it. It is recommended to have basic knowledge of probability theory and linear algebra before starting.


MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

Source: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

About the Latex source code of my slides

If you are a professor and preparing a course and would like to use any content from my slides, feel free to reach out by email. I can share the source code with you. The slides were created using Latex/Beamer.

Regarding reader feedback and questions in the discussion section, please note that due to a high volume of commitments, there may be significant delays in my response. Your understanding would be greatly appreciated.



Why a new book on reinforcement learning?

This book aims to provide a mathematical but friendly introduction to the fundamental concepts, basic problems, and classic algorithms in reinforcement learning. Some essential features of this book are highlighted as follows.

  • The book introduces reinforcement learning from a mathematical point of view. Hopefully, readers will not only know the procedure of an algorithm but also understand why it was designed in the first place and why it works effectively.

  • The depth of the mathematics is carefully controlled to an adequate level. The mathematics is also presented in a carefully designed manner to ensure that the book is friendly to read. Readers can selectively read the materials presented in gray boxes according to their interests.

  • Many illustrative examples are given to help readers better understand the topics. All the examples in this book are based on a grid world task, which is easy to understand and helpful for illustrating concepts and algorithms.

  • When introducing an algorithm, the book aims to separate its core idea from complications that may be distracting. In this way, readers can better grasp the core idea of an algorithm.

  • The contents of the book are coherently organized. Each chapter is built based on the preceding chapter and lays a necessary foundation for the subsequent one.

Book cover (https://link.springer.com/book/9789819739431)

Contents

The topics addressed in the book are shown in the figure below. This book contains ten chapters, which can be classified into two parts: the first part is about basic tools, and the second part is about algorithms. The ten chapters are highly correlated. In general, it is necessary to study the earlier chapters first before the later ones.

The map of this book

Readership

This book is designed for senior undergraduate students, graduate students, researchers, and practitioners interested in reinforcement learning.

It does not require readers to have any background in reinforcement learning because it starts by introducing the most basic concepts. If the reader already has some background in reinforcement learning, I believe the book can help them understand some topics more deeply or provide different perspectives.

This book, however, requires the reader to have some knowledge of probability theory and linear algebra. Some basics of the required mathematics are also included in the appendix of this book.

About the author

You can find my info on my homepage https://www.shiyuzhao.net (GoogleSite) and my research group website https://shiyuzhao.westlake.edu.cn

I have been teaching a graduate-level course on reinforcement learning since 2019. Along with teaching, I have been preparing this book as the lecture notes for my students.

I sincerely hope this book can help readers smoothly enter the exciting field of reinforcement learning.

Citation

@book{zhao2025RLBook, title={Mathematical Foundations of Reinforcement Learning}, author={S. Zhao}, year={2025}, publisher={Springer Press} }

Lecture videos

The lecture videos have received 2,100,000+ views over the Internet and received very good feedback! By combining the book with my lecture videos, I believe you can study better.

  • Chinese lecture videos: You can check the Bilibili channel (https://space.bilibili.com/2044042934) or the Youtube channel (https://www.youtube.com/channel/UCztGtS5YYiNv8x3pj9hLVgg/playlists).
  • English lecture videos: The English lecture videos have been uploaded to YouTube: link here (https://youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&si=D1T4pcyHsMxj6CzB)

  • Overview of Reinforcement Learning in 30 Minutes (https://www.youtube.com/watch?v=ZHMWHr9811U&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=1)
  • L1: Basic Concepts (P1-State, action, policy, …) (https://www.youtube.com/watch?v=zJHtM5dN69g&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=2)
  • L1: Basic Concepts (P2-Reward,return, Markov decision process) (https://www.youtube.com/watch?v=repVl3_GYCI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=3)
  • L2: Bellman Equation (P1-Motivating examples) (https://www.youtube.com/watch?v=XCzWrlgZCwc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=4)
  • L2: Bellman Equation (P2-State value) (https://www.youtube.com/watch?v=DSvi3xEN13I&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=5)
  • L2: Bellman Equation (P3-Bellman equation-Derivation) (https://www.youtube.com/watch?v=eNtId8yPWkA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=6)
  • L2: Bellman Equation (P4-Matrix-vector form and solution) (https://www.youtube.com/watch?v=EtCfBG_eP2w&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=7)
  • L2: Bellman Equation (P5-Action value) (https://www.youtube.com/watch?v=zJo2sLDzfcU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=8)
  • L3: Bellman Optimality Equation (P1-Motivating example) (https://www.youtube.com/watch?v=lXKY_Hyg4SQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=9)
  • L3: Bellman Optimality Equation (P2-Optimal policy) (https://www.youtube.com/watch?v=BxyjdHhK8a8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=10)
  • L3: Bellman Optimality Equation (P3-More on BOE) (https://www.youtube.com/watch?v=FXftTCKotC8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=11)
  • L3: Bellman Optimality Equation (P4-Interesting properties) (https://www.youtube.com/watch?v=a–bck2ow9s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=12)
  • L4: Value Iteration and Policy Iteration (P1-Value iteration) (https://www.youtube.com/watch?v=wMAVmLDIvQU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=13)
  • L4: Value Iteration and Policy Iteration (P2-Policy iteration) (https://www.youtube.com/watch?v=Pka6Om0nYQ8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=14)
  • L4: Value Iteration and Policy Iteration (P3-Truncated policy iteration) (https://www.youtube.com/watch?v=tUjPFPD3Vc8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=15)
  • L5: Monte Carlo Learning (P1-Motivating examples) (https://www.youtube.com/watch?v=DO1yXinAV_Q&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=16)
  • L5: Monte Carlo Learning (P2-MC Basic-introduction) (https://www.youtube.com/watch?v=6ShisunU0zs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=17)
  • L5: Monte Carlo Learning (P3-MC Basic-examples) (https://www.youtube.com/watch?v=axA0yns9FxU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=18)
  • L5: Monte Carlo Learning (P4-MC Exploring Starts) (https://www.youtube.com/watch?v=Qt8OMHPkLqg&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=19)
  • L5: Monte Carlo Learning (P5-MC Epsilon-Greedy-introduction) (https://www.youtube.com/watch?v=dM3fYE630pY&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=20)
  • L5: Monte Carlo Learning (P6-MC Epsilon-Greedy-examples) (https://www.youtube.com/watch?v=x6X_5ePT9gQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=21)
  • L6: Stochastic Approximation and SGD (P1-Motivating example) (https://www.youtube.com/watch?v=1bMgejvWoAo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=22)
  • L6: Stochastic Approximation and SGD (P2-RM algorithm: introduction) (https://www.youtube.com/watch?v=1FTGcNUUnCE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=23)
  • L6: Stochastic Approximation and SGD (P3-RM algorithm: convergence) (https://www.youtube.com/watch?v=juNDoAFEre4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=24)
  • L6: Stochastic Approximation and SGD (P4-SGD algorithm: introduction) (https://www.youtube.com/watch?v=EZO7Iadp5m4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=25)
  • L6: Stochastic Approximation and SGD (P5-SGD algorithm: examples) (https://www.youtube.com/watch?v=BsxU_4qvvNA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=26)
  • L6: Stochastic Approximation and SGD (P6-SGD algorithm: properties) (https://www.youtube.com/watch?v=fWxX9YuEHjE&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=27)
  • L6: Stochastic Approximation and SGD (P7-SGD algorithm: comparison) (https://www.youtube.com/watch?v=yNEV2cLKuzU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=28)
  • L7: Temporal-Difference Learning (P1-Motivating example) (https://www.youtube.com/watch?v=u1X-7XX3dtI&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=29)
  • L7: Temporal-Difference Learning (P2-TD algorithm: introduction) (https://www.youtube.com/watch?v=XiCUsc7CCE0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=30)
  • L7: Temporal-Difference Learning (P3-TD algorithm: convergence) (https://www.youtube.com/watch?v=faWg8M91-Oo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=31)
  • L7: Temporal-Difference Learning (P4-Sarsa) (https://www.youtube.com/watch?v=jYwQufkBUPo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=32)
  • L7: Temporal-Difference Learning (P5-Expected Sarsa & n-step Sarsa) (https://www.youtube.com/watch?v=0kKzQbWZOlk&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=33)
  • L7: Temporal-Difference Learning (P6-Q-learning: introduction) (https://www.youtube.com/watch?v=4BvYR2hm730&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=34)
  • L7: Temporal-Difference Learning (P7-Q-learning: pseudo code) (https://www.youtube.com/watch?v=I0YhlOIFF4s&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=35)
  • L7: Temporal-Difference Learning (P8-Unified viewpoint and summary) (https://www.youtube.com/watch?v=3t74lvk1GBM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=36)
  • L8: Value Function Approximation (P1-Motivating example–curve fitting) (https://www.youtube.com/watch?v=uJXcI8fcdWc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=37)
  • L8: Value Function Approximation (P2-Objective function) (https://www.youtube.com/watch?v=Z3HI1TfpJP0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=38)
  • L8: Value Function Approximation (P3-Optimization algorithm) (https://www.youtube.com/watch?v=piBDwrKt0uU&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=39)
  • L8: Value Function Approximation (P4-illustrative examples and analysis) (https://www.youtube.com/watch?v=VFyBNEZxMMs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=40)
  • L8: Value Function Approximation (P5-Sarsa and Q-learning) (https://www.youtube.com/watch?v=C-HtY4-W_zw&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=41)
  • L8: Value Function Approximation (P6-DQN–basic idea) (https://www.youtube.com/watch?v=lZCcbZbqVSQ&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=42)
  • L8: Value Function Approximation (P7-DQN–experience replay) (https://www.youtube.com/watch?v=rynEdAdebi0&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=43)
  • L8: Value Function Approximation (P8-DQN–implementation and example) (https://www.youtube.com/watch?v=vQHuCHjd6hA&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=44)
  • L9: Policy Gradient Methods (P1-Basic idea) (https://www.youtube.com/watch?v=mtFHOj83QSo&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=45)
  • L9: Policy Gradient Methods (P2-Metric 1–Average value) (https://www.youtube.com/watch?v=la8jQc3hX1M&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=46)
  • L9: Policy Gradient Methods (P3-Metric 2–Average reward) (https://www.youtube.com/watch?v=8RZ_rQFe69E&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=47)
  • L9: Policy Gradient Methods (P4-Gradients of the metrics) (https://www.youtube.com/watch?v=MvmtPXur3Ls&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=48)
  • L9: Policy Gradient Methods (P5-Gradient-based algorithms & REINFORCE) (https://www.youtube.com/watch?v=1DQnnUC8ng8&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=49)
  • L10: Actor-Critic Methods (P1-The simplest Actor-Critic) (https://www.youtube.com/watch?v=kjCZAT5Wh80&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=50)
  • L10: Actor-Critic Methods (P2-Advantage Actor-Critic) (https://www.youtube.com/watch?v=vZVXJJcZNEM&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=51)
  • L10: Actor-Critic Methods (P3-Importance sampling & off-policy Actor-Critic) (https://www.youtube.com/watch?v=TfO5mnsiGKc&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=52)
  • L10: Actor-Critic Methods (P4-Deterministic Actor-Critic) (https://www.youtube.com/watch?v=dTjz1RNtic4&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=53)
  • L10: Actor-Critic Methods (P5-Summary and goodbye!) (https://www.youtube.com/watch?v=npvnnKcXoBs&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=54)

Some comments from YouTube and Amazon:

Third-party code and materials

Many enthusiastic readers sent me the source code or notes that they developed when they studied this book. If you create any materials based on course, you are welcome to write an email. I am happy to share the links here and hope they may be helpful to other readers. I must emphasize that I have not verified the code. If you have any questions, you can directly contact the developers.

Code

Python:

  • https://github.com/AstonDky/Math_in_RL_Visual (May 2026, by Keyan Dong)

  • https://github.com/Ronchy2000/Multi-agent-RL/tree/master/RL_Learning-main (Oct 2025, by Rongqi Lu)

  • https://github.com/zhoubay/Code-for-Mathematical-Foundations-of-Reinforcement-Learning (Mar 2025, by Xibin ZHOU)

  • https://github.com/10-OASIS-01/minrl (Feb 2025)

  • https://github.com/SupermanCaozh/The_Coding_Foundation_in_Reinforcement_Learning (by Zehong Cao, Aug 2024)

  • https://github.com/ziwenhahaha/Code-of-RL-Beginning by RLGamer (Mar 2024)

    • Videos for code explanation: https://www.bilibili.com/video/BV1fW421w7NH
  • https://github.com/jwk1rose/RL_Learning by Wenkang Ji (Feb 2024)

Matlab:

  • https://github.com/EveryDayIsaSong/MATLAB-Code-for-Mathematical-Foundation-of-Reinforcement-Learning (by Yucheng Mao, Jan 2026)

R:

  • https://github.com/NewbieToEverything/Code-Mathmatical-Foundation-of-Reinforcement-Learning

C++:

  • https://github.com/purundong/test_rl

Notes and others

English:

  • https://lyk-love.cn/tags/reinforcement-learning/ by a graduate student from UC Davis

Chinese:

  • RL knowledge graph: https://hanfei-hz.github.io/assets/files/rl_explorer.html (by Fei Han, May 2026)

  • https://github.com/Peanut-Study/Reinforcement-Learning-Study-Note/tree/main (Jan 2026)

  • https://zhuanlan.zhihu.com/p/692207843

  • https://blog.csdn.net/qq_64671439/category_12540921.html

  • http://t.csdnimg.cn/EH4rj

  • https://blog.csdn.net/LvGreat/article/details/135454738

  • https://xinzhe.blog.csdn.net/article/details/129452000

  • https://blog.csdn.net/v20000727/article/details/136870879?spm=1001.2014.3001.5502

  • https://blog.csdn.net/m0_64952374/category_12883361.html

There are also many others notes made by many other readers on the Internet. I am not able to put them all here. You are welcome to recommend to me if you find a good one.

Bilibili videos made based on the book:

Similar Articles

@NFTCPS: Want to master Reinforcement Learning? Keep dreaming, bro. Online courses just teach you how to call APIs, leaving you utterly confused after finishing. Reading papers? Mountains of formulas will scare you off instantly. Trying to systematically understand the principles? The barrier to entry feels like climbing to heaven, and the learning path is as tangled as a maze. Recently, I stumbled upon an open-source book, 'Mathematical Foundations of Reinforcement Learning,' that pierces right through this fog. It provides a crystal-clear roadmap: starting from mathematics…

X AI KOLs Timeline

Introduces an open-source book, 'Mathematical Foundations of Reinforcement Learning,' which offers a rigorous yet accessible mathematical approach to RL, using grid world examples to clarify algorithmic logic.

@NFTCPS: If you work in AI, take this UCLA course! Theory + practice: a deep dive into RL and LLM training from scratch. Covers MDP, PPO algorithms, the full RLHF process, and hands-on Jupyter coding. Taught by a UCLA professor with videos and assignments, ready to apply immediately after completion. Course URL: https://ernestryu.com/courses/RL-LLM.html…

X AI KOLs Timeline

This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.

@Honcia13: What often discourages people from learning statistics and probability theory is not the knowledge itself. It's the screens full of formulas, abstract concepts, and long derivations. Many people end up not being unable to calculate, but not understanding what these concepts are really about. If you want to truly "see" probability and statistics, try this website: Seeing Theory...

X AI KOLs Timeline

Recommend an interactive visualization website called Seeing Theory to help users intuitively understand core concepts of probability and statistics, covering basic probability, distributions, inference, regression, etc., suitable for beginners and those reviewing.

@Michaelzsguo: This is one of the best deep discussions I've seen recently about the fundamentals of reinforcement learning and its relationship to modern AI. Eric Jang and Dwarkesh turned a seemingly retro exercise—rebuilding AlphaGo with today's tools—into a very clear masterclass: why 'search +...'

X AI KOLs Timeline

A detailed discussion on reinforcement learning and its connection to modern AI, using the reconstruction of AlphaGo with modern tools as a clear example of search and self-play. Key takeaways include neural network amortization of search, credit assignment challenges in LLMs vs AlphaGo, and implications for automated research.