@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…
Summary
Researchers release SU-01, a 30B-A3B reasoning model achieving gold-medal-level performance on physics and math Olympiad problems using a unified scaling recipe for proof search.
View Cached Full Text
Cached at: 05/15/26, 05:06 PM
We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluations: IPhO directly, and IMO/USAMO with test-time self-verification and refinement.
A simple, unified scaling recipe for proof search.
https://t.co/yc2ZlLVbD2
Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Source: https://huggingface.co/papers/2605.13301 Published on May 13
#1 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.
Recent progress inreasoning modelshas substantially advanced long-horizon mathematical andscientific problem solving, with several systems now reaching gold-medal-level performance onInternational Mathematical Olympiad(IMO) andInternational Physics Olympiad(IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoningbackboneinto a rigorous olympiad-level solver. The recipe first uses areverse-perplexity curriculumforSFTto instill rigorousproof-searchandself-checking behaviors, then scales these behaviors through a two-stageRLpipeline that progresses fromRLwithverifiable rewardsto more delicateproof-level RL, and finally boosts solving performance withtest-time scaling. Applying this recipe, we train a 30B-A3BbackbonewithSFTon around 340K sub-8K-tokentrajectories followed by 200RLsteps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100Ktokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
View arXiv pageView PDFProject pageGitHub41Add to collection
Get this paper in your agent:
hf papers read 2605\.13301
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### Simplified-Reasoning/SU-01 Reinforcement Learning• 31B• Updated1 day ago • 21 • 2
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.13301 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.13301 in a Space README.md to link it from this page.
Collections including this paper2
Similar Articles
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
A paper presenting SU-01, a 30B-A3B reasoning model that achieves gold-medal-level performance on IMO and IPhO problems via reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling.
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
This paper presents a simple and unified recipe combining supervised fine-tuning, two-stage reinforcement learning, and test-time scaling to train a reasoning model (SU-01) that achieves gold-medal-level performance on International Mathematical and Physics Olympiad problems.
@ClementDelangue: Paper of the day! https://huggingface.co/papers/2605.13301…
A paper introduces a unified recipe (SU-01) that combines reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling to achieve gold-medal-level performance on IMO and IPhO problems using a 30B-A3B backbone.
Introducing OpenAI o1
OpenAI released o1, a new series of reasoning-focused AI models that outperform previous models on complex tasks in science, coding, and mathematics. The preview model solved 83% of IMO problems compared to GPT-4o's 13%, and reached the 89th percentile in competitive coding.
OpenAI o3-mini
OpenAI releases o3-mini, a cost-efficient reasoning model with strong STEM capabilities, available in ChatGPT and API with support for function calling, structured outputs, and three reasoning effort levels. The model matches o1 performance in math and coding while being faster and cheaper, with free plan users gaining access to a reasoning model for the first time.