@ClementDelangue: Paper of the day! https://huggingface.co/papers/2605.13301…

X AI KOLs Following 05/15/26, 05:03 PM Papers

reasoning olympiad reinforcement-learning scaling curriculum scientific-reasoning paper-of-the-day

Summary

A paper introduces a unified recipe (SU-01) that combines reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling to achieve gold-medal-level performance on IMO and IPhO problems using a 30B-A3B backbone.

Paper of the day! https://t.co/6LSlvyBYKh https://t.co/Px7vi37viL

Original Article

View Cached Full Text

Cached at: 05/15/26, 09:08 PM

Paper of the day! https://t.co/6LSlvyBYKh https://t.co/Px7vi37viL

Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Source: https://huggingface.co/papers/2605.13301 Published on May 13

#1 Paper of the day Authors:

Abstract

A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.

Recent progress inreasoning modelshas substantially advanced long-horizon mathematical andscientific problem solving, with several systems now reaching gold-medal-level performance onInternational Mathematical Olympiad(IMO) andInternational Physics Olympiad(IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoningbackboneinto a rigorous olympiad-level solver. The recipe first uses areverse-perplexity curriculumforSFTto instill rigorousproof-searchandself-checking behaviors, then scales these behaviors through a two-stageRLpipeline that progresses fromRLwithverifiable rewardsto more delicateproof-level RL, and finally boosts solving performance withtest-time scaling. Applying this recipe, we train a 30B-A3BbackbonewithSFTon around 340K sub-8K-tokentrajectories followed by 200RLsteps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100Ktokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.

View arXiv page View PDF Project page GitHub41 Add to collection

Get this paper in your agent:

hf papers read 2605\.13301

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Simplified-Reasoning/SU-01 Reinforcement Learning• 31B• Updated2 days ago • 21 • 6

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13301 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13301 in a Space README.md to link it from this page.

@ClementDelangue: Paper of the day! https://huggingface.co/papers/2605.13301…

Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper4

Similar Articles

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…

@optimalab1: Huge kudos to Barbara Su (Rice CS -> MSc Stanford): she led every part of this end-to-end: algorithm, GLUE/SQuADpipelin…

Solving (some) formal math olympiad problems

Submit Feedback

Similar Articles

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…

@optimalab1: Huge kudos to Barbara Su (Rice CS -> MSc Stanford): she led every part of this end-to-end: algorithm, GLUE/SQuADpipelin…

Solving (some) formal math olympiad problems