@0xLogicrw: MiniMax Developer Relations Lead Ryan Lee announced that MaxProof, a test-time scaling framework for large language model mathematical proofs, has been officially open-sourced, along with a companion technical paper. MaxProof restructures mathematical proof during inference into an evolutionary search system, enabling inference scaling through verification, repair, and elimination mechanisms.
Summary
MiniMax open-sourced MaxProof, a test-time scaling framework for LLM mathematical proofs, and released a companion paper. The framework uses an evolutionary search mechanism to enable the M3 model to achieve gold-medal scores on both the IMO 2025 and USAMO 2026 test sets.
View Cached Full Text
Cached at: 06/12/26, 12:58 PM
Ryan Lee, Head of Developer Relations at MiniMax, announced that MaxProof, a test-time scaling framework for large model mathematical proofs, has been officially open-sourced, along with an accompanying technical paper.
MaxProof reframes the mathematical proof process during inference as an evolutionary search system, achieving inference-time scaling through verification, repair, and elimination mechanisms.
With the support of the MaxProof framework, the MiniMax-M3 model scored 35 and 36 points (out of a possible 42) on the International Mathematical Olympiad (IMO 2025) and the United States of America Mathematical Olympiad (USAMO 2026) test sets respectively, both achieving the gold medal threshold.
On the algorithmic design side, the development team constructed a multi-layered defense verification mechanism by integrating three expert capabilities: generation, verification, and repair. The generation expert uses the primary reward signal provided by the generative verifier to guide long-range reinforcement learning training. The verification expert focuses on explicit error detection to reduce the false positive rate. The repair expert corrects flagged erroneous proofs through refined fine-tuning under critical conditions. These three expert capabilities are ultimately merged into the released M3 model.
During inference, MaxProof reshapes the proof derivation process into an evolutionary search. The M3 model is decoupled into four roles: generator, verifier, optimizer, and scorer. The system first constructs a pool of candidate proofs as the population, applies mutations via local repair patches and re-exploration rewrites, and finally selects the best derivation through a tournament mechanism. This evolutionary search mechanism successfully converts the model’s best@K capability on mathematical proofs into a more stable pass@1 performance.
RyanLee (@RyanLeeMiniMax): With the MaxProof framework, M3 exceeded the human gold-medal threshold on both sets. In this paper, we go deeper into the technical path behind our progress in mathematical proof: improving the base model, aligning a verifier, building refinement capability, and designing the
Similar Articles
Maxproof
MaxProof introduces a test-time scaling framework that combines proof generation, verification, and repair using generative-verifier RL, enabling the M3 model to exceed human gold-medal thresholds on IMO 2025 and USAMO 2026.
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
MaxProof is a test-time scaling framework that enhances mathematical proof generation using a generative verifier and population-level search, achieving scores exceeding human gold-medal thresholds on IMO 2025 and USAMO 2026.
@FinanceYF5: Google new paper: Let LLM solve math competition problems, accuracy jumps from 10% to 70%. [LEAP framework] Instead of having the model write a complete proof at once, it breaks down the problem into a goal tree, learns step by step from Lean verifier feedback, and reuses proven lemmas. Result: All 12 problems of Putnam 2025 solved, IMO style…
Google new paper proposes the LEAP framework, which decomposes math problems into goal trees, learns from Lean verifier feedback, and improves LLM accuracy on math competition problems from 10% to 70%. It solves all 12 problems of Putnam 2025 and surpasses dedicated gold-medal-level systems on IMO-style benchmarks.
MiniMaxAI/MiniMax-M2.7
MiniMaxAI releases MiniMax-M2.7, an open-weight model featuring self-evolution capabilities, advanced agent team support, and strong performance on software engineering benchmarks (56.22% on SWE-Pro, 66.6% medal rate on MLE Bench Lite), with notable applications in production incident recovery and professional work tasks.
@stingning: We’re releasing a 30B-A3B reasoning model that reaches gold-medal level across both physics and math Olympiad evaluatio…
Researchers release SU-01, a 30B-A3B reasoning model achieving gold-medal-level performance on physics and math Olympiad problems using a unified scaling recipe for proof search.