rollout-editing

#rollout-editing

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper introduces Dynamic Rollout Editing (DRE), a training-time intervention to reduce overthinking in GRPO-style reinforcement learning for reasoning models. DRE edits successful trajectories by preserving the solution-reachable prefix and preferring verified shorter edits, weakening the preference for unnecessary thinking.

0 favorites 0 likes

rollout-editing

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

Submit Feedback