rollout-editing

Tag

Cards List
#rollout-editing

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

arXiv cs.CL · 2026-06-17 Cached

This paper introduces Dynamic Rollout Editing (DRE), a training-time intervention to reduce overthinking in GRPO-style reinforcement learning for reasoning models. DRE edits successful trajectories by preserving the solution-reachable prefix and preferring verified shorter edits, weakening the preference for unnecessary thinking.

0 favorites 0 likes
← Back to home

Submit Feedback