Tag
This paper introduces MuCon, a clipped-Muon optimizer for LLM training that applies singular-value clipping instead of full polarization, preserving smaller singular values while clipping only the largest ones. It explores approximations to avoid full SVD, including polar/absolute-value formulas and rational Newton filters, noting numerical challenges near the threshold.
The Vergecast episode explores how social media feeds are dominated by clipped content and algorithmic brute force, and also reviews the new Fitbit Air fitness tracker and discusses smart glasses as a product category.