@GitHub_Daily: Want to understand the underlying principles of large language models? Most resources only cover theory or provide source code, leaving you still confused. Stumbled upon this open-source tutorial, EveryonesLLM, which guides us step by step to build a complete large language model from scratch on Google Colab, writing code throughout. The whole tutorial is divided into...

X AI KOLs Timeline 06/16/26, 04:00 AM Tools

open-source tutorial llm colab transformer fine-tuning education

Summary

EveryonesLLM is an open-source tutorial that provides 29 chapters of Colab notebooks. It teaches users step by step to build a complete large language model from scratch on Google Colab, including pre-training and instruction fine-tuning, and supports Chinese.

Want to understand the underlying principles of large language models? Most resources only cover theory or provide source code, leaving you still confused. Stumbled upon this open-source tutorial, EveryonesLLM, which guides us step by step to build a complete large language model from scratch on Google Colab, writing code throughout. The whole tutorial is divided into 29 chapters, starting from the most basic data loading and word embeddings, gradually building up to the attention mechanism and Transformer modules, and finally completing pre-training and instruction fine-tuning. GitHub: http://github.com/HayatoHongo/EveryonesLLM… Each chapter is an independent Colab notebook. Just open your browser and run it — no need to mess with local environments. It adopts a 'practice + answer' format: fill in the code yourself first, then check the answers. This way, you learn more solidly. The tutorial is continuously updated, and recently added chapters on Vision LLM (Vision Large Language Models). After completing the tutorial, we can train a small AI capable of conversation and even try it out online.

Original Article

View Cached Full Text

Cached at: 06/16/26, 01:38 PM

If you want to understand the underlying principles of large language models, most resources only introduce theoretical knowledge or provide source code, leaving you confused.

I stumbled upon the open-source tutorial EveryonesLLM, which guides you step by step to build a complete large language model from scratch on Google Colab, writing all the code yourself.

The entire tutorial is divided into 29 chapters, starting from basic data loading and word embeddings, gradually building up to attention mechanisms, Transformer modules, and finally completing pretraining and instruction fine-tuning.

GitHub: http://github.com/HayatoHongo/EveryonesLLM

Each chapter is an independent Colab notebook — just open it in your browser and run it; no need to fuss with a local environment.

It also adopts a “practice + answer” format, where you write the code yourself first and then check your answers. This helps you learn more thoroughly.

The tutorial is continuously updated, and recently added a chapter on Vision LLM.

After completing the tutorial, you will be able to train a small conversational AI and even try it out online.

HayatoHongo/EveryonesLLM

Source: https://github.com/HayatoHongo/EveryonesLLM

🌐 Select Language / 日本語 🇯🇵 (https://github.com/HayatoHongo/EveryonesLLM/tree/ja) | 中文 🇨🇳

Build LLM on Google Colab from scratch

EveryonesLLM_demo

🎉 Click → AI YOU build in Chapter29😘 (https://huggingface.co/spaces/HayatoHongoEveryonesAI/EveryonesGPT_SFT)

WebUI

WebApp Released (Now only in Japanese) (https://EveryonesAI-v2.created.app/)

EveryonesLLM

Chapter	Estimated Time	Notebook
Chapter 00: Start Tutorial	1-2 hours	Open in Colab
Chapter 01: Dataloader	1-2 hours	Open in Colab
Chapter 02: TokenEmbedding	0.5-1 hour	Open in Colab
Chapter 03: PositionEmbedding	0.5-1 hour	Open in Colab
Chapter 04: EmbeddingModule	0.5-1 hour	Open in Colab
Chapter 05: LayerNorm	1-2 hours	Open in Colab
Chapter 06: AttentionHead	3-4 hours	Open in Colab
Chapter 07: MultiHeadAttention	1-2 hours	Open in Colab
Chapter 08: FeedForward	1-2 hours	Open in Colab
Chapter 09: TransformerBlock	0.5-1 hour	Open in Colab
Chapter 10: VocabularyLogits	0.5-1 hour	Open in Colab
Chapter 11: nanoGPT	1-2 hours	Open in Colab
Chapter 12: Trainer	1-2 hours	Open in Colab
Chapter 13: Tokens per second (CPU)	1-2 hours	Open in Colab
Chapter 14: Tokens per second (T4 GPU)	0.5-1 hour	Open in Colab
Chapter 15: Train nanoGPT with GPU	0.5-1 hour	Open in Colab
Chapter 16: Make only the model size bigger	0.5-1 hour (+ 1 hour model training)	Open in Colab
Chapter 17: Make the dataset bigger	1-2 hours (+ 1 hour model training)	Open in Colab
Chapter 18: tiktoken	1-2 hours (+ 1 hour model training)	Open in Colab
Chapter 19: Long Train	1-2 hours (+ 6 hours model training)	Open in Colab
Chapter 20: Learning rate	0.5-1 hour	Open in Colab
Chapter 21: Scaling Law	1-2 hours	Open in Colab
Chapter 22: TinyStories (Main)	1-2 hours	Open in Colab
Chapter 22: TinyStories (Model Training)	1 hour	Open in Colab
Chapter 23: RPE (OverSimplified)	2-3 hours	Open in Colab
Chapter 24: RPE (Simplified)	1-2 hours (+ 1 hour model training)	Open in Colab
Chapter 25: LR schedule	1 hour	Open in Colab
Chapter 26: Checkpoint	1 hour	Open in Colab
Chapter 27: Pretraining	0.5 hour (+ 20 hours model training)	Open in Colab
Chapter 28: Instruction Tuning	0.5 hour (+ 0.5 hour model training)	Open in Colab
Chapter 29: Magpie (Prompt mask)	1.5 hours (+ 2 hours model training)	Open in Colab

2026/6/5 Vision LLM beta is now available!

Explanations and exercises are not available yet. Evaluation on major benchmarks is also not available yet.

Please use it for early preview learning. We plan to update it from time to time, so we recommend working on it after future updates.

Chapter	Estimated time	Notebook
Chapter 30: Vision Pretraining (Beta)	3 hours model training	Open in Colab
Chapter 31: Vision Instruction Tuning (Beta)	2 hours model training	Open in Colab

EveryonesVLM_demo

Link to Web App (Vision LLM) (https://huggingface.co/spaces/HayatoHongoEveryonesAI/EveryonesGPT_Vision_Instruct_noRoPE)

Tensor Map (Full Tensor Overview)

Try making the tensor map below by yourself!
Do not worry, I prepared lots of hints for you.
View the full-resolution Tensor Map of the nanoGPT model on Canva

Everyones TensorMap

About the Development Environment

To keep setup easy, please try running all the samples on Google Colab.

However, Google Colab does not save checkmarks in checkboxes.
If you want to track your progress, or if you want to work little by little, say every 30 minutes, I recommend VS Code.
In that case, fork this repository and clone it to your own PC.
Just use the Google Colab extension for VS Code, then you can use Colab CPU and GPU.

Answers

Chapter	Estimated Time	Notebook
Chapter 00: Start Tutorial	1-2 hours	Open in Colab
Chapter 01: Dataloader	1-2 hours	Open in Colab
Chapter 02: TokenEmbedding	0.5-1 hour	Open in Colab
Chapter 03: PositionEmbedding	0.5-1 hour	Open in Colab
Chapter 04: EmbeddingModule	0.5-1 hour	Open in Colab
Chapter 05: LayerNorm	1-2 hours	Open in Colab
Chapter 06: AttentionHead	3-4 hours	Open in Colab
Chapter 07: MultiHeadAttention	1-2 hours	Open in Colab
Chapter 08: FeedForward	1-2 hours	Open in Colab
Chapter 09: TransformerBlock	0.5-1 hour	Open in Colab
Chapter 10: VocabularyLogits	0.5-1 hour	Open in Colab
Chapter 11: nanoGPT	1-2 hours	Open in Colab
Chapter 12: Trainer	1-2 hours	Open in Colab
Chapter 13: Tokens per second (CPU)	1-2 hours	Open in Colab
Chapter 14: Tokens per second (T4 GPU)	0.5-1 hour	Open in Colab
Chapter 15: Train nanoGPT with GPU	0.5-1 hour	Open in Colab
Chapter 16: Make only the model size bigger	0.5-1 hour (+ 1 hour model training)	Open in Colab
Chapter 17: Make the dataset bigger	1-2 hours (+ 1 hour model training)	Open in Colab
Chapter 18: tiktoken	1-2 hours (+ 1 hour model training)	Open in Colab
Chapter 19: Long Train	1-2 hours (+ 6 hours model training)	Open in Colab
Chapter 20: Learning rate	0.5-1 hour	Open in Colab
Chapter 21: Scaling Law	1-2 hours	Open in Colab
Chapter 22: TinyStories (Main)	1-2 hours	Open in Colab
Chapter 22: TinyStories (Model Training)	1 hour	Open in Colab
Chapter 23: RPE (OverSimplified)	2-3 hours	Open in Colab
Chapter 24: RPE (Simplified)	1-2 hours (+ 1 hour model training)	Open in Colab