@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…

X AI KOLs Timeline Models

Summary

Researchers release SU-01, a 30B-A3B reasoning model achieving gold-medal-level performance on physics and math Olympiad problems using a unified scaling recipe for proof search.

We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluations: IPhO directly, and IMO/USAMO with test-time self-verification and refinement. A simple, unified scaling recipe for proof search. https://t.co/yc2ZlLVbD2
Original Article
View Cached Full Text

Cached at: 05/15/26, 05:06 PM

We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluations: IPhO directly, and IMO/USAMO with test-time self-verification and refinement.

A simple, unified scaling recipe for proof search.

https://t.co/yc2ZlLVbD2


Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Source: https://huggingface.co/papers/2605.13301 Published on May 13

#1 Paper of the day Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.

Recent progress inreasoning modelshas substantially advanced long-horizon mathematical andscientific problem solving, with several systems now reaching gold-medal-level performance onInternational Mathematical Olympiad(IMO) andInternational Physics Olympiad(IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoningbackboneinto a rigorous olympiad-level solver. The recipe first uses areverse-perplexity curriculumforSFTto instill rigorousproof-searchandself-checking behaviors, then scales these behaviors through a two-stageRLpipeline that progresses fromRLwithverifiable rewardsto more delicateproof-level RL, and finally boosts solving performance withtest-time scaling. Applying this recipe, we train a 30B-A3BbackbonewithSFTon around 340K sub-8K-tokentrajectories followed by 200RLsteps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100Ktokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.

View arXiv pageView PDFProject pageGitHub41Add to collection

Get this paper in your agent:

hf papers read 2605\.13301

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Simplified-Reasoning/SU-01 Reinforcement Learning• 31B• Updated1 day ago • 21 • 2

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13301 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13301 in a Space README.md to link it from this page.

Collections including this paper2

Similar Articles

Introducing OpenAI o1

OpenAI Blog

OpenAI released o1, a new series of reasoning-focused AI models that outperform previous models on complex tasks in science, coding, and mathematics. The preview model solved 83% of IMO problems compared to GPT-4o's 13%, and reached the 89th percentile in competitive coding.

OpenAI o3-mini

OpenAI Blog

OpenAI releases o3-mini, a cost-efficient reasoning model with strong STEM capabilities, available in ChatGPT and API with support for function calling, structured outputs, and three reasoning effort levels. The model matches o1 performance in math and coding while being faster and cheaper, with free plan users gaining access to a reasoning model for the first time.