data-recipe

Tag

Cards List
#data-recipe

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

arXiv cs.CL · 2026-06-18 Cached

This paper shows that a carefully crafted data recipe for long-context reinforcement learning, using minimal outcome-based GRPO, significantly improves reasoning across multiple models and benchmarks, and transfers to agentic tasks like GAIA and BrowseComp.

0 favorites 0 likes
← Back to home

Submit Feedback