@QingQ77: 'Dive into Deep Learning' is an excellent introductory book, but its update speed struggles to keep pace with the field's development. Since the Transformer, content like CLIP, Diffusion, vLLM, and more has proliferated. While online resources are abundant, they are highly fragmented—today you study Attention, tomorrow LoRA, the day after...

X AI KOLs Timeline Tools

Summary

This project is a systematic deep learning notes repository covering PyTorch, Transformers, generative models, and more. It aims to address the fragmentation of learning materials and provides code implementations along with practical guides.

'Dive into Deep Learning' is an excellent introductory book, but its update speed struggles to keep pace with the field's development. Since the Transformer, content like CLIP, Diffusion, vLLM, and more has proliferated. While online resources are abundant, they are highly fragmented—today you study Attention, tomorrow LoRA, the day after Diffusion models; ultimately, what remains are often just fragments, making it difficult to form a coherent system. This project is currently maintained and published primarily using Quarto and is built as a static website. Quarto is a plain text format based on Markdown, suitable for version control and continuous updates. Content mainly includes: - PyTorch core and engineering practices - Attention mechanisms and Transformer series models - Generative models, such as GAN, VAE, Diffusion - Multimodal models, such as CLIP - Hugging Face ecosystem and practical applications - Practical notes from data processing to training, inference, and deployment https://github.com/jshn9515/deep-learning-notes…
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/10/26, 02:21 AM

Dive into Deep Learning is a great introductory book, but its update pace hasn’t kept up with the development of this field. Since the advent of Transformers, content like CLIP, Diffusion, and vLLM has become increasingly prominent. Although online materials are abundant, they are very scattered. One day you study Attention, the next day LoRA, and the day after diffusion models. In the end, what remains are often just fragments, making it hard to truly form a coherent system. This project is currently primarily maintained and published using Quarto Markdown, and built as a static website. Quarto Markdown is a plain-text format based on Markdown, suitable for version control and continuous updates. Content mainly includes:

  • PyTorch Core and Engineering Practices
  • Attention Mechanisms and Transformer Series Models
  • Generative Models, such as GANs, VAEs, Diffusion
  • Multimodal Models, such as CLIP, etc.
  • Hugging Face Ecosystem and Practical Applications
  • Practical Notes from Data Processing to Training, Inference, and Deployment

https://github.com/jshn9515/deep-learning-notes…


jshn9515/deep-learning-notes

Source: https://github.com/jshn9515/deep-learning-notes

Deep Learning Notes

English | Simplified Chinese

dnnl-title

For a long time, I struggled with how to learn deep learning effectively.

Dive into Deep Learning is an excellent introductory book, but its update pace has gradually fallen behind the speed of progress in this field. Since the rise of Transformers, topics like CLIP, Diffusion, and vLLM have become increasingly important. Although there is no shortage of online material, most of it is scattered. One day you study Attention, the next day LoRA, and the day after that diffusion models. In the end, what often remains are only fragments, and it is hard to build a truly coherent understanding.

So I decided to systematically organize what I have learned. From the fundamentals of PyTorch, to Attention and Transformers, and then to GANs, CLIP, Stable Diffusion, and SAM3, I try to explain the core ideas, mathematical derivations, code implementations, and common pitfalls of each topic as clearly as possible. This repository is the public version of those notes. If you are also learning deep learning on your own, I hope it can be helpful to you.

📌 About These Notes

This project is primarily maintained and published in Quarto Markdown, and built as a static website. Quarto Markdown is a plain-text format based on Markdown, which makes it well suited for version control and continuous updates.

The content mainly includes:

  • PyTorch fundamentals and engineering practice
  • Attention mechanisms and Transformer-based models
  • Generative models, such as GANs, VAEs, and diffusion models
  • Multimodal models, such as CLIP
  • The Hugging Face ecosystem and its practical use
  • Practical notes covering the full workflow from data processing to training, inference, and deployment

To make the material easier to use, I also periodically prepare corresponding Jupyter Notebook versions:

  • Monthly Releases: provide relatively stable packaged Notebook versions
  • GitHub Actions Artifacts: provide the latest build outputs

If you want a stable version, please check the Releases page. If you want the latest version, please check the Artifacts in GitHub Actions.

If you prefer generating Notebook files from the source yourself, you can also install Quarto locally and use the quarto convert command to convert .qmd files into Jupyter Notebooks. For example:

bash quarto convert path/to/file.qmd

🔧 Environment

All code in this repository has been tested in the following environment:

  • Python 3.14
  • PyTorch 2.11

See requirements.txt for the full list of dependencies.

Before running the related content, please first enter the dnnl directory and install the dnnl library according to the instructions in dnnl/README.md. This library contains some custom implementations and utility functions used throughout the notes, and many examples will not run properly without it.

This project uses Transformers v5. If you are following other repositories or tutorials based on v4, there may be significant API differences (such as tokenizers and quantization configurations). Please refer to the official migration guide (https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md) for adjustments.

🤝 Contributions

If you find an explanation unclear, notice a problem in the code, or have topics you would like me to add, feel free to contribute through Issues or Pull Requests.

Possible contributions include, but are not limited to:

  • Pointing out errors or inaccuracies in the notes
  • Adding clearer explanations, derivations, or code comments
  • Suggesting improvements to structure, wording, or formatting
  • Recommending topics or practical cases for future coverage

Since this is a project I am building and refining while learning, there will inevitably be places where my understanding is incomplete or my explanations are not precise enough. I read all helpful feedback carefully and try to improve the notes whenever possible.

If you would like to make a larger change, it is recommended to open an Issue first with a brief description so that we can discuss it in advance.

🙏 Acknowledgements

While organizing these notes, I have benefited from many excellent resources. In particular, Dive into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola, as well as Professor Hung-yi Lee’s deep learning lecture series, have helped me greatly in understanding many core concepts in deep learning.

This project website is built with Quarto (https://quarto.org/).

📄 License

  • The notes in this repository are licensed under CC BY-NC 4.0.
  • The dnnl library is licensed under MIT.

Similar Articles

@VincentLogic: This video is essentially a 'must-watch' checklist for AI engineers! It clearly explains the 10 core papers that have shaped today's AI industry, ranging from the foundational Transformer architecture to LoRA fine-tuning, RAG, Agents, and even the latest MCP protocol. If you want to dive deeper into how…

X AI KOLs Timeline

This article recommends a video that systematically explains the 10 core papers shaping today's AI industry, covering Transformer, LoRA, RAG, Agents, and the MCP protocol, aiming to help engineers clarify the technological lineage.

@wsl8297: Sharing an easy-to-read open-source book 'Foundations of Large Models'. From an introduction to large language models to architectural evolution, then to key technologies such as Prompt engineering, parameter-efficient fine-tuning, model editing, retrieval-augmented generation (RAG), all in one book. GitHub: https://github.com/ZJU-LLMs/…

X AI KOLs Timeline

The Zhejiang University team open-sourced an easy-to-understand textbook on large models 'Foundations of Large Models', covering from architectural evolution to key technologies like RAG, accompanied by the Agent-Kernel multi-agent framework.

@QingQ77: 30 runnable Jupyter notebooks that thoroughly cover LLM agent memory technologies from short-term to long-term, simple to production-grade. https://github.com/NirDiamant/Agent_Memory_Techniques… This repo covers L...

X AI KOLs Timeline

A GitHub repository containing 30 runnable Jupyter notebooks that comprehensively explain LLM agent memory technologies, from short-term context to production-grade patterns, covering methods like MemGPT, Zep, Graphiti, along with decision trees and comparison tables.

@bozhou_ai: Self-learning Vibe Coding? Just these three open-source projects are enough, no need to buy courses. Many AI Coding course materials come from here, but the originals are more systematic. 1. Easy-Vibe: A systematic tutorial from DataWhale, 5k stars. Three stages: from AI programming small games…

X AI KOLs Timeline

This article recommends three high-star open-source GitHub projects to help developers systematically learn AI programming and Vibe Coding workflows at zero cost, covering structured tutorials, prompt skill libraries, and a comprehensive tool directory.