spectral-shaping

#spectral-shaping

How Much Orthogonalization Does Muon Need?

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper studies how much orthogonalization the Muon optimizer requires, proposing a five-step cubic Newton-Schulz schedule that reduces computational cost while achieving training quality similar to more expensive methods across GPT-2 Small and hybrid MoE/Mamba models.

0 favorites 0 likes

#spectral-shaping

DynMuon: A Dynamic Spectral Shaping View of Muon

Hugging Face Daily Papers ↗ · 2026-05-16 Cached

This paper introduces DynMuon, a dynamic spectral shaping optimizer that schedules the update parameter p from positive to mildly negative during training, consistently achieving lower validation loss and requiring 10.6-26.5% fewer steps than the standard Muon optimizer.

0 favorites 0 likes

spectral-shaping

How Much Orthogonalization Does Muon Need?

DynMuon: A Dynamic Spectral Shaping View of Muon

Submit Feedback