dapo

Tag

Cards List
#dapo

@johnschulman2: PPO had a second wave in the LLM era for reasons unanticipated by the original paper - the importance-ratio objective f…

X AI KOLs Following · yesterday Cached

This paper reveals that the clipping mechanism in PPO and GRPO biases entropy in RLVR for LLMs: clip-low increases entropy, clip-high decreases it. The authors prove that standard clipping reduces entropy even with random rewards, and show that adjusting clip-low can prevent entropy collapse and promote exploration.

0 favorites 0 likes
← Back to home

Submit Feedback