Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Hugging Face Daily Papers 05/13/26, 12:00 AM Papers

reasoning olympiad math physics reinforcement-learning scaling curriculum

Summary

A paper presenting SU-01, a 30B-A3B reasoning model that achieves gold-medal-level performance on IMO and IPhO problems via reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling.

Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.

Original Article

View Cached Full Text

Cached at: 05/15/26, 04:23 AM

Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Source: https://huggingface.co/papers/2605.13301 Published on May 13

#1 Paper of the day Authors:

Abstract

A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.

Recent progress inreasoning modelshas substantially advanced long-horizon mathematical andscientific problem solving, with several systems now reaching gold-medal-level performance onInternational Mathematical Olympiad(IMO) andInternational Physics Olympiad(IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoningbackboneinto a rigorous olympiad-level solver. The recipe first uses areverse-perplexity curriculumforSFTto instill rigorousproof-searchandself-checking behaviors, then scales these behaviors through a two-stageRLpipeline that progresses fromRLwithverifiable rewardsto more delicateproof-level RL, and finally boosts solving performance withtest-time scaling. Applying this recipe, we train a 30B-A3BbackbonewithSFTon around 340K sub-8K-tokentrajectories followed by 200RLsteps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100Ktokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.

View arXiv page View PDF Project page GitHub20 Add to collection

Get this paper in your agent:

hf papers read 2605\.13301

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Simplified-Reasoning/SU-01 Reinforcement Learning• 31B• Updated1 day ago • 9

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13301 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13301 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…

@ClementDelangue: Paper of the day! https://huggingface.co/papers/2605.13301…

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

TEMPO: Scaling Test-time Training for Large Reasoning Models

Submit Feedback

Similar Articles

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…

@ClementDelangue: Paper of the day! https://huggingface.co/papers/2605.13301…

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

TEMPO: Scaling Test-time Training for Large Reasoning Models