spectral

#spectral

MuCon: Clipped Muon Updates for LLM Training

arXiv cs.LG ↗ · 2026-05-27 Cached

This paper introduces MuCon, a clipped-Muon optimizer for LLM training that applies singular-value clipping instead of full polarization, preserving smaller singular values while clipping only the largest ones. It explores approximations to avoid full SVD, including polar/absolute-value formulas and rational Newton filters, noting numerical challenges near the threshold.

0 favorites 0 likes

spectral

MuCon: Clipped Muon Updates for LLM Training

Submit Feedback