Tag
MegaTrain enables full-precision training of 100B+ LLMs on a single GPU by treating VRAM as a transient stateless cache, inverting the memory hierarchy.