Tag
Megaprop is a new library for efficient preconditioned optimization across GPUs, forked from Megatron and TransformerEngine, with FSDP support for Muon, FOOF, KFAC, and Newton-Muon, and MuP support for width and depth.