newton-schulz

Tag

Cards List
#newton-schulz

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

Hacker News Top · 5d ago Cached

This blog post presents Gram Newton-Schulz, a hardware-aware optimization of the Newton-Schulz orthogonalization procedure used in the Muon optimizer, achieving significant speedups for training large language models while preserving model quality.

0 favorites 0 likes
#newton-schulz

Spectral Scaling Laws of Muon

arXiv cs.LG · 2026-06-04 Cached

This paper presents the first systematic study of singular value spectral behavior in Muon optimizer momentum matrices during LLM training, discovering clean power-law scaling relationships across model sizes (77M–2.8B parameters). The findings provide practitioners with principled, layer-aware guidelines for configuring Newton–Schulz iterations to maintain orthonormalization quality at frontier scale without unnecessary computation.

0 favorites 0 likes
#newton-schulz

How Much Orthogonalization Does Muon Need?

arXiv cs.LG · 2026-06-02 Cached

This paper studies how much orthogonalization the Muon optimizer requires, proposing a five-step cubic Newton-Schulz schedule that reduces computational cost while achieving training quality similar to more expensive methods across GPT-2 Small and hybrid MoE/Mamba models.

0 favorites 0 likes
#newton-schulz

MuCon: Clipped Muon Updates for LLM Training

arXiv cs.LG · 2026-05-27 Cached

This paper introduces MuCon, a clipped-Muon optimizer for LLM training that applies singular-value clipping instead of full polarization, preserving smaller singular values while clipping only the largest ones. It explores approximations to avoid full SVD, including polar/absolute-value formulas and rational Newton filters, noting numerical challenges near the threshold.

0 favorites 0 likes
← Back to home

Submit Feedback