Tag
This paper introduces Dynamic Rollout Editing (DRE), a training-time intervention to reduce overthinking in GRPO-style reinforcement learning for reasoning models. DRE edits successful trajectories by preserving the solution-reachable prefix and preferring verified shorter edits, weakening the preference for unnecessary thinking.