Tag
ExpRL is a new RL-based mid-training method that uses human-written reference solutions as dense reward scaffolds (never shown to the policy) to improve LLM reasoning, achieving significant gains on hard math benchmarks like AIME-2026.