@ClementDelangue: Paper of the day! https://huggingface.co/papers/2605.13301…
Summary
A paper introduces a unified recipe (SU-01) that combines reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling to achieve gold-medal-level performance on IMO and IPhO problems using a 30B-A3B backbone.
View Cached Full Text
Cached at: 05/15/26, 09:08 PM
Paper of the day! https://t.co/6LSlvyBYKh https://t.co/Px7vi37viL
Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Source: https://huggingface.co/papers/2605.13301 Published on May 13
#1 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.
Recent progress inreasoning modelshas substantially advanced long-horizon mathematical andscientific problem solving, with several systems now reaching gold-medal-level performance onInternational Mathematical Olympiad(IMO) andInternational Physics Olympiad(IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoningbackboneinto a rigorous olympiad-level solver. The recipe first uses areverse-perplexity curriculumforSFTto instill rigorousproof-searchandself-checking behaviors, then scales these behaviors through a two-stageRLpipeline that progresses fromRLwithverifiable rewardsto more delicateproof-level RL, and finally boosts solving performance withtest-time scaling. Applying this recipe, we train a 30B-A3BbackbonewithSFTon around 340K sub-8K-tokentrajectories followed by 200RLsteps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100Ktokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
View arXiv pageView PDFProject pageGitHub41Add to collection
Get this paper in your agent:
hf papers read 2605\.13301
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### Simplified-Reasoning/SU-01 Reinforcement Learning• 31B• Updated2 days ago • 21 • 6
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.13301 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.13301 in a Space README.md to link it from this page.
Collections including this paper4
Similar Articles
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
A paper presenting SU-01, a 30B-A3B reasoning model that achieves gold-medal-level performance on IMO and IPhO problems via reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling.
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
This paper presents a simple and unified recipe combining supervised fine-tuning, two-stage reinforcement learning, and test-time scaling to train a reasoning model (SU-01) that achieves gold-medal-level performance on International Mathematical and Physics Olympiad problems.
@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…
Researchers release SU-01, a 30B-A3B reasoning model achieving gold-medal-level performance on physics and math Olympiad problems using a unified scaling recipe for proof search.
@optimalab1: Huge kudos to Barbara Su (Rice CS -> MSc Stanford): she led every part of this end-to-end: algorithm, GLUE/SQuADpipelin…
Introduces AdaPaD, a parallel rank-1 deflation method for LoRA fine-tuning, enabling low-rank linear regression components to be computed concurrently instead of sequentially, improving efficiency.
Solving (some) formal math olympiad problems
OpenAI achieved a new state-of-the-art 41.2% on the miniF2F formal math olympiad benchmark using a technique called 'statement curriculum learning,' which iteratively trains a neural prover on proofs of increasing difficulty. The approach builds on iterative proof search and retraining over 8 iterations to significantly outperform the previous best of 29.3%.