gram-matrix

#gram-matrix

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

Hacker News Top ↗ · 6d ago Cached

This blog post presents Gram Newton-Schulz, a hardware-aware optimization of the Newton-Schulz orthogonalization procedure used in the Muon optimizer, achieving significant speedups for training large language models while preserving model quality.

0 favorites 0 likes

gram-matrix

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

Submit Feedback