Built an LLM training framework that actually runs on older GPUs without crashing
Summary
Introduces Picotron, a clean-room rewrite of Nanotron that eliminates mandatory GPU-specific dependencies, enabling LLM training on older GPUs like T4 and V100. It defaults to standard PyTorch SDPA but supports FlashAttention-2 at runtime.
Similar Articles
@tom_doerr: Runs 70B LLMs on single 4GB GPU https://github.com/lyogavin/airllm
AirLLM is an open-source tool that optimizes inference memory usage, enabling 70B LLMs to run on a single 4GB GPU without quantization, and supports 405B models on 8GB VRAM.
@tom_doerr: Trains billion-parameter LLMs from scratch on a single GPU https://github.com/FareedKhan-dev/train-llm-from-scratch…
A GitHub repository provides scripts to train billion-parameter language models from scratch on a single GPU using PyTorch, based on the Transformer architecture.
Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)
A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.
235M param LLM from scratch on a single RTX 5080
A hobbyist trained a 235M-parameter LLM from scratch on a single RTX 5080, sharing full PyTorch pipeline and open-sourcing Plasma 1.0.
Me train LLM on 8GB from Scratch. Me happy
Built a repository to train a tiny language model (25M parameters) from scratch on 8GB VRAM, with support for MTP but noting limitations of mHC and BitNet.