Solving (some) formal math olympiad problems

OpenAI Blog Papers

Summary

OpenAI achieved a new state-of-the-art 41.2% on the miniF2F formal math olympiad benchmark using a technique called 'statement curriculum learning,' which iteratively trains a neural prover on proofs of increasing difficulty. The approach builds on iterative proof search and retraining over 8 iterations to significantly outperform the previous best of 29.3%.

We built a neural theorem prover for Lean that learned to solve a variety of challenging high-school olympiad problems, including problems from the AMC12 and AIME competitions, as well as two problems adapted from the IMO.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:55 PM

# Solving (some) formal math olympiad problems Source: [https://openai.com/index/formal-math/](https://openai.com/index/formal-math/) We achieved a new state\-of\-the\-art \(41\.2% vs 29\.3%\) on the[miniF2F⁠\(opens in a new window\)](https://arxiv.org/abs/2109.00110)benchmark, a challenging collection of high\-school olympiad problems\. Our approach, which we call*statement curriculum learning*, consists of manually collecting a set of statements of varying difficulty levels \(without proof\) where the hardest statements are similar to the benchmark we target\. Initially our neural prover is weak and can only prove a few of them\. We iteratively search for new proofs and re\-train our neural network on the newly discovered proofs, and after 8 iterations, our prover ends up being vastly superior when tested on miniF2F\. Formal mathematics is an exciting domain to study because of \(i\) its richness, letting you prove arbitrary theorems which require reasoning, creativity and insight and \(ii\) its similarity to games—where AI has been spectacularly successful—in that it has an automated way of determining whether a proof is successful \(i\.e\., verified by the formal system\)\. As demonstrated in the trivial example below, proving a formal statement requires generating a sequence of proof steps, each proof step consisting in a call to a tactic\.[B](https://openai.com/index/formal-math/#citation-bottom-B) These tactics take mathematical terms as arguments and each tactic call will transform the current statement to prove, into statements that are easier to prove, until nothing is left to prove\.

Similar Articles

Solving math word problems

OpenAI Blog

OpenAI trained a system using verifiers to solve grade school math word problems with 90% of child-level accuracy, nearly doubling fine-tuned GPT-3 performance. The approach addresses language models' weakness in multistep reasoning by training verifiers to evaluate candidate solutions and select the best one.

Our First Proof submissions

OpenAI Blog

OpenAI submitted proof attempts for the First Proof challenge, a research-level math competition testing whether AI can produce correct, checkable proofs. The company's internal model successfully solved at least five of the ten problems, demonstrating significant progress in sustained reasoning and rigorous mathematical thinking.

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

arXiv cs.CL

This paper introduces Discover and Prove (DAP), an open-source agentic framework for automated theorem proving in Lean 4 that tackles 'Hard Mode' problems where the answer must be discovered independently before formal proof construction. The work releases new Hard Mode benchmark variants and achieves state-of-the-art results while revealing a significant gap between LLM answer accuracy (>80%) and formal prover success (<10%).

Introducing OpenAI o1

OpenAI Blog

OpenAI released o1, a new series of reasoning-focused AI models that outperform previous models on complex tasks in science, coding, and mathematics. The preview model solved 83% of IMO problems compared to GPT-4o's 13%, and reached the 89th percentile in competitive coding.