@ClementDelangue: Paper of the day! https://huggingface.co/papers/2605.13301…

X AI KOLs Following Papers

Summary

A paper introduces a unified recipe (SU-01) that combines reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling to achieve gold-medal-level performance on IMO and IPhO problems using a 30B-A3B backbone.

Paper of the day! https://t.co/6LSlvyBYKh https://t.co/Px7vi37viL
Original Article
View Cached Full Text

Cached at: 05/15/26, 09:08 PM

Paper of the day! https://t.co/6LSlvyBYKh https://t.co/Px7vi37viL


Paper page - Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Source: https://huggingface.co/papers/2605.13301 Published on May 13

#1 Paper of the day Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

A systematic approach transforms post-trained reasoning models into rigorous olympiad-level solvers through reverse-perplexity curriculum, two-stage reinforcement learning, and test-time scaling, achieving gold-medal performance on mathematical and physics competitions.

Recent progress inreasoning modelshas substantially advanced long-horizon mathematical andscientific problem solving, with several systems now reaching gold-medal-level performance onInternational Mathematical Olympiad(IMO) andInternational Physics Olympiad(IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoningbackboneinto a rigorous olympiad-level solver. The recipe first uses areverse-perplexity curriculumforSFTto instill rigorousproof-searchandself-checking behaviors, then scales these behaviors through a two-stageRLpipeline that progresses fromRLwithverifiable rewardsto more delicateproof-level RL, and finally boosts solving performance withtest-time scaling. Applying this recipe, we train a 30B-A3BbackbonewithSFTon around 340K sub-8K-tokentrajectories followed by 200RLsteps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100Ktokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.

View arXiv pageView PDFProject pageGitHub41Add to collection

Get this paper in your agent:

hf papers read 2605\.13301

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Simplified-Reasoning/SU-01 Reinforcement Learning• 31B• Updated2 days ago • 21 • 6

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13301 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13301 in a Space README.md to link it from this page.

Collections including this paper4

Similar Articles

Solving (some) formal math olympiad problems

OpenAI Blog

OpenAI achieved a new state-of-the-art 41.2% on the miniF2F formal math olympiad benchmark using a technique called 'statement curriculum learning,' which iteratively trains a neural prover on proofs of increasing difficulty. The approach builds on iterative proof search and retraining over 8 iterations to significantly outperform the previous best of 29.3%.