critic-free

#critic-free

Rethinking Groups in Critic-Free RLVR

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper rethinks the role of grouping in critic-free reinforcement learning for LLMs and proposes negative token filtering to enable stable training with a single rollout per prompt, achieving comparable or better performance on reasoning and agentic tasks.

0 favorites 0 likes

critic-free

Rethinking Groups in Critic-Free RLVR

Submit Feedback