I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch
Summary
The author details the process of pretraining and post-training a 500M parameter language model and a 330M parameter image generator entirely from scratch.
Similar Articles
I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size
Trained a 75M parameter LLM called KeyLM from scratch on 18B tokens, achieving competitive instruction-following scores against larger models while using fewer parameters and less data.
@tom_doerr: Trains billion-parameter LLMs from scratch on a single GPU https://github.com/FareedKhan-dev/train-llm-from-scratch…
A GitHub repository provides scripts to train billion-parameter language models from scratch on a single GPU using PyTorch, based on the Transformer architecture.
Making a vintage LLM from scratch
The author documents their journey of building a 340M parameter LLM from scratch, trained exclusively on pre-1900 texts, including custom datasets, training scripts, and open-sourcing the model and code.
Me train LLM on 8GB from Scratch. Me happy
Built a repository to train a tiny language model (25M parameters) from scratch on 8GB VRAM, with support for MTP but noting limitations of mHC and BitNet.
Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)
A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.