I pretrained and post trained a 500M parameter LLM and 330M parameter Image generator from scratch

Reddit r/LocalLLaMA Tools

Summary

The author details the process of pretraining and post-training a 500M parameter language model and a 330M parameter image generator entirely from scratch.

No content available
Original Article

Similar Articles

Making a vintage LLM from scratch

Hacker News Top

The author documents their journey of building a 340M parameter LLM from scratch, trained exclusively on pre-1900 texts, including custom datasets, training scripts, and open-sourcing the model and code.

Me train LLM on 8GB from Scratch. Me happy

Reddit r/LocalLLaMA

Built a repository to train a tiny language model (25M parameters) from scratch on 8GB VRAM, with support for MTP but noting limitations of mHC and BitNet.