@wsl8297: Discovered a deep learning paper reading project on GitHub: paper-reading. Author Mu Shen reads classic and new deep learning papers paragraph by paragraph, recorded into video explanations, has been updated for over 3 years. GitHub: https://github.com/mli/paper-reading...
Summary
Mu Shen's deep learning paper reading project on GitHub includes in-depth reading videos of major papers such as GPT-4, Llama 3.1, Sora, etc. Each video is about 1 hour, suitable for AI researchers and developers to deeply understand classic papers.
View Cached Full Text
Cached at: 05/15/26, 10:59 AM
Deep Learning Paper Reading Project: paper-reading
Discovered a deep learning paper reading project on GitHub: paper-reading. The author, Mu Shen, does a paragraph-by-paragraph deep reading of classic and new deep learning papers, recording video explanations that have been updated for over 3 years. GitHub: https://github.com/mli/paper-reading… The project includes in-depth reading videos for major papers like GPT-4, Llama 3.1, Sora, DALL·E 2, Instruct GPT, Whisper, Chain of Thought, etc. Each video is about an hour of deep explanation, breaking down the paper paragraph by paragraph. Videos are also updated simultaneously on Bilibili and YouTube, and include series such as multimodal paper overviews and CLIP improvement work overviews. In addition to paper reading, there are also sharing of research ideas in the era of large models, research methodology, and more. Suitable for AI researchers and developers who want to deeply understand classic papers and keep up with cutting-edge progress.
mli/paper-reading
Source: https://github.com/mli/paper-reading
Deep Learning Paper Reading
Recorded Papers
| Date | Title | Cover | Duration | Video (Views) |
|---|---|---|---|---|
| 1/10/25 | OpenAI Sora (https://openai.com/index/video-generation-models-as-world-simulators/) Part 1 (including Movie Gen and HunyuanVideo) | 1:04:18 | bilibili (https://www.bilibili.com/video/BV1VdcxesEAt/?share_source=copy_web&vd_source=5d037e935914fc22e2e978cdccf5cdfe) | |
| 9/04/24 | Llama 3.1 Paper Reading · 5. Model Training Process | 10:41 | bilibili (https://www.bilibili.com/video/BV1c8HbeaEXi) | |
| 8/28/24 | Llama 3.1 Paper Reading · 4. Training Infrastructure | 25:04 | bilibili (https://www.bilibili.com/video/BV1b4421f7fa) | |
| 8/13/24 | Llama 3.1 Paper Reading · 3. Model | 26:14 | bilibili (https://www.bilibili.com/video/BV1Q4421Z7Tj) | |
| 8/05/24 | Llama 3.1 Paper Reading · 2. Pre-training Data (https://arxiv.org/pdf/2407.21783) | 23:37 | bilibili (https://www.bilibili.com/video/BV1u142187S5) | |
| 7/31/24 | Llama 3.1 Paper Reading · 1. Introduction | 18:53 | bilibili (https://www.bilibili.com/video/BV1WM4m1y7Uh) | |
| 3/30/23 | GPT-4 (https://openai.com/research/gpt-4) | 1:20:38 | bilibili (https://www.bilibili.com/video/BV1vM4y1U7b5) | |
| 3/23/23 | Four Research Ideas in the Era of Large Models | 1:06:29 | bilibili (https://www.bilibili.com/video/BV1oX4y1d7X6) | |
| 3/10/23 | Anthropic LLM (https://arxiv.org/pdf/2204.05862.pdf) | 1:01:51 | bilibili (https://www.bilibili.com/video/BV1XY411B7nM) | |
| 1/20/23 | Helm (https://arxiv.org/pdf/2211.09110.pdf) Comprehensive Language Model Evaluation | 1:23:37 | bilibili (https://www.bilibili.com/video/BV1z24y1B7uX) | |
| 1/11/23 | Multimodal Paper Overview · Part 2 | 1:03:29 | bilibili (https://www.bilibili.com/video/BV1fA411Z772) | |
| 12/29/22 | Instruct GPT (https://arxiv.org/pdf/2203.02155.pdf) | 1:07:10 | bilibili (https://www.bilibili.com/video/BV1hd4y187CR) | |
| 12/19/22 | Neural Corpus Indexer (https://arxiv.org/pdf/2206.02743.pdf) Document Retrieval | 55:47 | bilibili (https://www.bilibili.com/video/BV1Se411w7Sn) | |
| 12/12/22 | Multimodal Paper Overview · Part 1 | 1:12:27 | bilibili (https://www.bilibili.com/video/BV1Vd4y1v77v) | |
| 11/14/22 | OpenAI Whisper (https://cdn.openai.com/papers/whisper.pdf) In-depth Reading | 1:12:16 | bilibili (https://www.bilibili.com/video/BV1VG4y1t74x) | |
| 11/07/22 | Before Talking About OpenAI Whisper, I Made a Little Video Editing Tool | 23:39 | bilibili (https://www.bilibili.com/video/BV1Pe4y1t7de) | |
| 10/23/22 | Chain of Thought (https://arxiv.org/pdf/2201.11903.pdf) Paper, Code, and Resources | 33:21 | bilibili (https://www.bilibili.com/video/BV1t8411e7Ug) | |
| 9/17/22 | CLIP Improvement Work Overview (Part 2) | 1:04:26 | bilibili (https://www.bilibili.com/video/BV1gg411U7n4) | |
| 9/2/22 | CLIP Improvement Work Overview (Part 1) | 1:14:43 | bilibili (https://www.bilibili.com/video/BV1FV4y1p7Lm) | |
| 7/29/22 | ViLT (https://arxiv.org/pdf/2102.03334.pdf) Paper In-depth Reading | 1:03:26 | bilibili (https://www.bilibili.com/video/BV14r4y1j74y) | |
| 7/22/22 | Reasons, Evidence, and Warrants [The Craft of Research (https://press.uchicago.edu/ucp/books/book/chicago/C/bo23521678.html) · 4] | 44:14 | bilibili (https://www.bilibili.com/video/BV1SB4y1a75c) | |
| 7/15/22 | How to Tell a Good Story, Arguments in a Story [The Craft of Research (https://press.uchicago.edu/ucp/books/book/chicago/C/bo23521678.html) · 3] | 43:56 | bilibili (https://www.bilibili.com/video/BV1WB4y1v7ST) | |
| 7/8/22 | DALL·E 2 (https://arxiv.org/pdf/2204.06125.pdf) Paragraph-by-Paragraph Reading | 1:27:54 | bilibili (https://www.bilibili.com/video/BV17r4y1u77B) | |
| 7/1/22 | Understanding the Importance of the Problem [The Craft of Research (https://press.uchicago.edu/ucp/books/book/chicago/C/bo23521678.html) · 2] | 1:03:40 | bilibili (https://www.bilibili.com/video/BV11S4y1v7S2/) | |
| 6/24/22 | Connecting with Readers [The Craft of Research (https://press.uchicago.edu/ucp/books/book/chicago/C/bo23521678.html) · 1] | 45:01 | bilibili (https://www.bilibili.com/video/BV1hY411T7vy/) | |
| 6/17/22 | Zero (https://arxiv.org/pdf/1910.02054.pdf) Paragraph-by-Paragraph Reading | 52:21 | bilibili (https://www.bilibili.com/video/BV1tY411g7ZT/) | |
| 6/10/22 | DETR (https://arxiv.org/pdf/2005.12872.pdf) Paragraph-by-Paragraph Reading | 54:22 | bilibili (https://www.bilibili.com/video/BV1GB4y1X72R/) | |
| 6/3/22 | Megatron LM (https://arxiv.org/pdf/1909.08053.pdf) Paragraph-by-Paragraph Reading | 56:07 | bilibili (https://www.bilibili.com/video/BV1nB4y1R7Yz/) | |
| 5/27/22 | GPipe (https://proceedings.neurips.cc/paper/2019/file/093f65e080a295f8076b1c5722a46aa2-Paper.pdf) Paragraph-by-Paragraph Reading | 58:47 | bilibili (https://www.bilibili.com/video/BV1v34y1E7zu/) | |
| 5/5/22 | Pathways (https://arxiv.org/pdf/2203.12533.pdf) Paragraph-by-Paragraph Reading | 1:02:13 | bilibili (https://www.bilibili.com/video/BV1xB4y1m7Xi/) | |
| 4/28/22 | Video Understanding Paper Overview (https://arxiv.org/pdf/2012.06567.pdf) (Part 2) | 1:08:32 | bilibili (https://www.bilibili.com/video/BV11Y411P7ep/) | |
| 4/21/22 | Parameter Server (https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-li_mu.pdf) Paragraph-by-Paragraph Reading | 1:37:40 | bilibili (https://www.bilibili.com/video/BV1YA4y197G8/) | |
| 4/14/22 | Video Understanding Paper Overview (https://arxiv.org/pdf/2012.06567.pdf) (Part 1) | 51:15 | bilibili (https://www.bilibili.com/video/BV1fL4y157yA/) | |
| 3/31/22 | I3D (https://arxiv.org/pdf/1705.07750.pdf) Paper In-depth Reading | 52:31 | bilibili (https://www.bilibili.com/video/BV1tY4y1p7hq/) | |
| 3/24/22 | Stanford 2022 AI Index Report (https://aiindex.stanford.edu/wp-content/uploads/2022/03/2022-AI-Index-Report_Master.pdf) In-depth Reading | 1:19:56 | bilibili (https://www.bilibili.com/video/BV1s44y1N7eu/) | |
| 3/17/22 | AlphaCode (https://storage.googleapis.com/deepmind-media/AlphaCode/competition_level_code_generation_with_alphacode.pdf) Paper In-depth Reading | 44:00 | bilibili (https://www.bilibili.com/video/BV1ab4y1s7rc/) | |
| 3/10/22 | OpenAI Codex (https://arxiv.org/pdf/2107.03374.pdf) Paper In-depth Reading | 47:58 | bilibili (https://www.bilibili.com/video/BV1iY41137Zi/) zhihu (https://www.zhihu.com/zvideo/1490959755963666432) | |
| 3/3/22 | GPT (https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf), GPT-2 (https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), GPT-3 (https://arxiv.org/abs/2005.14165) In-depth Reading | 1:29:58 | bilibili (https://www.bilibili.com/video/BV1AF411b7xQ/) | |
| 2/24/22 | Two-Stream (https://proceedings.neurips.cc/paper/2014/file/00ec53c4682d36f5c4359f4ae7bd7ba1-Paper.pdf) Paragraph-by-Paragraph Reading | 52:57 | bilibili (https://www.bilibili.com/video/BV1mq4y1x7RU/) | |
| 2/10/22 | CLIP (https://openai.com/blog/clip/) Paragraph-by-Paragraph Reading | 1:38:25 | bilibili (https://www.bilibili.com/video/BV1SL4y1s7LQ/) zhihu (https://www.zhihu.com/zvideo/1475706654562299904) | |
| 2/6/22 | Have You Been (or Complained About) Papers Not Being Novel (https://perceiving-systems.blog/en/post/novelty-in-science) Enough? | 14:11 | bilibili (https://www.bilibili.com/video/BV1ea41127Bq/) zhihu (https://www.zhihu.com/zvideo/1475719090198876161) | |
| 1/23/22 | AlphaFold 2 (https://www.nature.com/articles/s41586-021-03819-2.pdf) In-depth Reading | 1:15:28 | bilibili (https://www.bilibili.com/video/BV1oR4y1K7Xr/) zhihu (https://www.zhihu.com/zvideo/1469132410537717760) | |
| 1/18/22 | How to Judge the Value of (Your Own) Research Work | 9:59 | bilibili (https://www.bilibili.com/video/BV1oL411c7Us/) zhihu (https://www.zhihu.com/zvideo/1475716940051869696) | |
| 1/15/22 | Swin Transformer (https://arxiv.org/pdf/2103.14030.pdf) In-depth Reading | 1:00:21 | bilibili (https://www.bilibili.com/video/BV13L4y1475U/) zhihu (https://www.zhihu.com/zvideo/1466282983652691968) | |
| 1/7/22 | Guiding Mathematical Intuition (https://www.nature.com/articles/s41586-021-04086-x.pdf) | 52:51 | bilibili (https://www.bilibili.com/video/BV1YZ4y1S72j/) zhihu (https://www.zhihu.com/zvideo/1464060386375299072) | |
| 1/5/22 | AlphaFold 2 Preview | 03:28 | bilibili (https://www.bilibili.com/video/BV1Eu411U7Te/) | |
| 12/20/21 | Contrastive Learning Paper Survey | 1:32:01 | bilibili (https://www.bilibili.com/video/BV19S4y1M7hm/) zhihu (https://www.zhihu.com/zvideo/1460828005077164032) | |
| 12/15/21 | MoCo (https://arxiv.org/pdf/1911.05722.pdf) Paragraph-by-Paragraph Reading | 1:24:11 | bilibili (https://www.bilibili.com/video/BV1C3411s7t9/) zhihu (https://www.zhihu.com/zvideo/1454723120678936576) | |
| 12/9/21 | How to Find Research Ideas 1 | 5:34 | bilibili (https://www.bilibili.com/video/BV1qq4y1z7F2/) | |
| 12/8/21 | MAE (https://arxiv.org/pdf/2111.06377.pdf) Paragraph-by-Paragraph Reading | 47:04 | bilibili (https://www.bilibili.com/video/BV1sq4y1q77t/) zhihu (https://www.zhihu.com/zvideo/1452458167968251904) | |
| 11/29/21 | ViT (https://arxiv.org/pdf/2010.11929.pdf) Paragraph-by-Paragraph Reading | 1:11:30 | bilibili (https://www.bilibili.com/video/BV15P4y137jb/) zhihu (https://www.zhihu.com/zvideo/1449195245754380288) | |
| 11/18/21 | BERT (https://arxiv.org/abs/1810.04805) Paragraph-by-Paragraph Reading | 45:49 | bilibili (https://www.bilibili.com/video/BV1PL411M7eQ/) zhihu (https://www.zhihu.com/zvideo/1445340200976785408) | |
| 11/9/21 | GAN (https://papers.nips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf) Paragraph-by-Paragraph Reading | 46:16 | bilibili (https://www.bilibili.com/video/BV1rb4y187vD/) zhihu (https://www.zhihu.com/zvideo/1442091389241159681) | |
| 11/3/21 | Zero-Based Multi-Figure Detailed Explanation of Graph Neural Networks (https://distill.pub/2021/gnn-intro/) (GNN/GCN) | 1:06:19 | bilibili (https://www.bilibili.com/video/BV1iT4y1d7zP/) zhihu (https://www.zhihu.com/zvideo/1439540657619087360) | |
| 10/27/21 | Transformer (https://arxiv.org/abs/1706.03762) Paragraph-by-Paragraph Reading (References mentioned in video 1) | 1:27:05 | bilibili (https://www.bilibili.com/video/BV1pu411o7BE/) zhihu (https://www.zhihu.com/zvideo/1437034536677404672) | |
| 10/22/21 | ResNet (https://arxiv.org/abs/1512.03385) Paragraph-by-Paragraph Reading | 53:46 | bilibili (https://www.bilibili.com/video/BV1P3411y7nn/) zhihu (https://www.zhihu.com/zvideo/1434795406001180672) | |
| 10/21/21 | ResNet (https://arxiv.org/abs/1512.03385): The Backbone of Computer Vision | 11:50 | bilibili (https://www.bilibili.com/video/BV1Fb4y1h73E/) zhihu (https://www.zhihu.com/zvideo/1434787226101751808) | |
| 10/15/21 | AlexNet (https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) Paragraph-by-Paragraph Reading | 55:21 | bilibili (https://www.bilibili.com/video/BV1hq4y157t1/) zhihu (https://www.zhihu.com/zvideo/1432354207483871232) | |
| 10/14/21 | Rereading a Foundational Work of Deep Learning 9 Years Later: AlexNet (https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) | 19:59 | bilibili (https://www.bilibili.com/video/BV1ih411J7Kz/) zhihu (https://www.zhihu.com/zvideo/1432155856322920448) | |
| 10/06/21 | How to Read a Paper | 06:39 | bilibili (https://www.bilibili.com/video/BV1H44y1t75x/) zhihu (https://www.zhihu.com/zvideo/1428973951632969728) |
1 Stanford 200+ page survey with 100+ authors (https://arxiv.org/abs/2108.07258), 2 New research on LayerNorm (https://arxiv.org/pdf/1911.07013.pdf), 3 Research on the role of Attention in Transformers (https://arxiv.org/abs/2103.03404)
All Papers
Includes papers already recorded and those to be introduced later. The selection principle is influential papers in deep learning within the last 10 years (must-read papers), or recent interesting papers. Of course, there are too many important works in these ten years to cover one by one. When selecting, I will lean towards those not covered in previous live classes (https://c.d2l.ai/zh-v2/).
Feel free to provide suggestions (requests) in the discussion area (https://github.com/mli/paper-reading/discussions).
Total papers: 67, Recorded: 32 (Citations here use Semantic Scholar because it provides an API (https://api.semanticscholar.org/api-docs/graph#operation/get_graph_get_paper) to automatically fetch without manual updates.)
Computer Vision - CNN
| Recorded | Year | Name | Description | Citations |
|---|---|---|---|---|
| ✅ | 2012 | AlexNet (https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) | Foundational work of the deep learning boom | citation (https://www.semanticscholar.org/paper/ImageNet-classification-with-deep-convolutional-Krizhevsky-Sutskever/abd1c342495432171beb7ca8fd9551ef13cbd0ff) |
| 2014 | VGG (https://arxiv.org/pdf/1409.1556.pdf) | Deeper networks using 3x3 convolutions | citation (https://www.semanticscholar.org/paper/Very-Deep-Convolutional-Networks-for-Large-Scale-Simonyan-Zisserman/eb42cf88027de515750f230b23b1a057dc782108) | |
| 2014 | GoogleNet (https://arxiv.org/pdf/1409.4842.pdf) | Deeper networks using parallel architectures | citation (https://www.semanticscholar.org/paper/Going-deeper-with-convolutions-Szegedy-Liu/e15cf50aa89fee8535703b9f9512fca5bfc43327) | |
| ✅ | 2015 | ResNet (https://arxiv.org/pdf/1512.03385.pdf) | Residual connections essential for deep networks. | citation (https://www.semanticscholar.org/paper/Deep-Residual-Learning-for-Image-Recognition-He-Zhang/2c03df8b48bf3fa39054345bafabfeff15bfd11d) |
| 2017 | MobileNet (https://arxiv.org/pdf/1704.04861.pdf) | Small CNN suitable for mobile devices | citation (https://www.semanticscholar.org/paper/MobileNets%3A-Efficient-Convolutional-Neural-Networks-Howard-Zhu/3647d6d0f151dc05626449ee09cc7bce55be497e) | |
| 2019 | EfficientNet (https://arxiv.org/pdf/1905.11946.pdf) | CNN obtained through architecture search | citation (https://www.semanticscholar.org/paper/EfficientNet%3A-Rethinking-Model-Scaling-for-Neural-Tan-Le/4f2eda8077dc7a69bb2b4e0a1a086cf054adb3f9) | |
| 2021 | Non-deep networks (https://arxiv.org/pdf/2110.07641.pdf) | Achieving SOTA on ImageNet with shallow networks | citation (https://www.semanticscholar.org/paper/Non-deep-Networks-Goyal-Bochkovskiy/0d7f6086772079bc3e243b7b375a9ca1a517ba8b) |
Computer Vision - Transformer
| Recorded | Year | Name | Description | Citations |
|---|---|---|---|---|
| ✅ | 2020 | ViT (https://arxiv.org/pdf/2010.11929.pdf) | Transformer enters CV | citation (https://www.semanticscholar.org/paper/An-Image-is-Worth-16x16-Words%3A-Transformers-for-at-Dosovitskiy-Beyer/7b15fa1b8d413fbe14ef7a97f651f47f5aff3903) |
| ✅ | 2021 | Swin Transformer (https://arxiv.org/pdf/2103.14030.pdf) | Hierarchical Vision Transformer | citation (https://www.semanticscholar.org/paper/Swin-Transformer%3A-Hierarchical-Vision-Transformer-Liu-Lin/c8b25fab5608c3e033d34b4483ec47e68ba109b7) |
| 2021 | MLP-Mixer (https://arxiv.org/pdf/2105.01601.pdf) | Replacing self-attention with MLPs | citation (https://www.semanticscholar.org/paper/MLP-Mixer%3A-An-all-MLP-Architecture-for-Vision-Tolstikhin-Houlsby/2def61f556f9a5576ace08911496b7c7e4f970a4) | |
| ✅ | 2021 | MAE (https://arxiv.org/pdf/2111.06377.pdf) | BERT version for CV | citation (https://www.semanticscholar.org/paper/Masked-Autoencoders-Are-Scalable-Vision-Learners-He-Chen/c1962a8cf364595ed2838a097e9aa7cd159d3118) |
Generative Models
| Recorded | Year | Name | Description | Citations |
|---|---|---|---|---|
| ✅ | 2014 | GAN (https://papers.nips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf) | Pioneering work in generative models | citation (https://www.semanticscholar.org/pap |
Similar Articles
@QingQ77: 'Dive into Deep Learning' is an excellent introductory book, but its update speed struggles to keep pace with the field's development. Since the Transformer, content like CLIP, Diffusion, vLLM, and more has proliferated. While online resources are abundant, they are highly fragmented—today you study Attention, tomorrow LoRA, the day after...
This project is a systematic deep learning notes repository covering PyTorch, Transformers, generative models, and more. It aims to address the fragmentation of learning materials and provides code implementations along with practical guides.
@VincentLogic: This video is essentially a 'must-watch' checklist for AI engineers! It clearly explains the 10 core papers that have shaped today's AI industry, ranging from the foundational Transformer architecture to LoRA fine-tuning, RAG, Agents, and even the latest MCP protocol. If you want to dive deeper into how…
This article recommends a video that systematically explains the 10 core papers shaping today's AI industry, covering Transformer, LoRA, RAG, Agents, and the MCP protocol, aiming to help engineers clarify the technological lineage.
@vista8: Last night I casually tested Knowly developed by the @Ethan_Yang_AI team. Tried interpreting YouTube videos and arXiv papers – the results were stunning. Except for a rather limited free quota and slightly slow vector processing. In terms of both product interaction and interpretation quality, it's no less impressive than NotebookLM. With a Chrome extension that has only a few users but has already been selected by Google as a featured pick, its strength is evident. Official site in comments https://t.co/62NkT3pO4G
Introduces Knowly AI tool, capable of interpreting YouTube videos and arXiv papers with impressive results. Interaction and interpretation quality rival NotebookLM. Comes with a Chrome extension already featured by Google. Drawbacks: limited free quota and slightly slow vector processing.
@wsl8297: When learning AI, the scariest part is getting stuck at "understanding the theory" and freezing when it's time to write code — not knowing where to start, and unable to find decent practice projects. I unearthed a practical treasure trove on GitHub: AI-Project-Gallery. It collects 30+ high-quality AI projects, covering classic topics like house price prediction and disease classification, as well as hot applications like Gemini chatbot and document generator...
This post shares a curated GitHub repository containing over 30 practical AI projects, covering domains from regression to generative AI, with many end-to-end examples, suitable for learners and developers.
@nuannuan_share: If I wanted to land a $200K AI engineer job in 90 days, I wouldn't go back to school. I'd master these 10 GitHub repositories. 1. awesome-llm-apps — A production-grade AI guide covering RAG, agents, and multimodal apps with full code. 106K+ stars. Repo …
A Chinese social media post recommends 10 GitHub repositories, claiming that mastering them can help land a $200K AI engineer job within 90 days. The repos cover mainstream AI development frameworks and tools including LangChain, LangGraph, CrewAI, Ollama, and Qdrant.