Tag
This paper reveals that the clipping mechanism in PPO and GRPO biases entropy in RLVR for LLMs: clip-low increases entropy, clip-high decreases it. The authors prove that standard clipping reduces entropy even with random rewards, and show that adjusting clip-low can prevent entropy collapse and promote exploration.
This paper introduces MuCon, a clipped-Muon optimizer for LLM training that applies singular-value clipping instead of full polarization, preserving smaller singular values while clipping only the largest ones. It explores approximations to avoid full SVD, including polar/absolute-value formulas and rational Newton filters, noting numerical challenges near the threshold.
The Vergecast episode explores how social media feeds are dominated by clipped content and algorithmic brute force, and also reviews the new Fitbit Air fitness tracker and discusses smart glasses as a product category.