data-recipe

#data-recipe

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

arXiv cs.CL ↗ · 2026-06-18 Cached

This paper shows that a carefully crafted data recipe for long-context reinforcement learning, using minimal outcome-based GRPO, significantly improves reasoning across multiple models and benchmarks, and transfers to agentic tasks like GAIA and BrowseComp.

0 favorites 0 likes

data-recipe

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

Submit Feedback