@rohanpaul_ai: Another great paper from Google. Shows general LLMs can solve formal math by planning proofs and checking each step. Ra…
Summary
A new Google paper introduces LEAP, an agentic framework that enables general LLMs to solve formal math problems by planning proofs and checking each step, raising performance from under 10% to 70% on the Lean IMO benchmark and solving all 2025 Putnam problems.
View Cached Full Text
Cached at: 06/05/26, 07:11 AM
Another great paper from Google.
Shows general LLMs can solve formal math by planning proofs and checking each step. Raised general LLM performance from under 10% to 70%.
A general LLM failed badly when asked to write full formal proofs in 1 try, but became much stronger when it planned, split the work into smaller claims, reused past claims, and learned from Lean’s feedback.
The paper shows the weakness was not just the model’s math ability, but the way it was being used - the absence of structured interaction with a verifier.
The key idea is that the model does not try to write one giant perfect proof at once, because that usually fails on long and tricky problems.
Instead, LEAP stores the proof as a graph of goals and subgoals, so useful lemmas can be reused instead of rediscovered every time.
The authors tested LEAP on Putnam 2025 and a new Lean benchmark built from 60 IMO-style problems, where ordinary one-shot proof writing did very poorly.
LEAP solved all 12 Putnam 2025 problems and raised general LLM performance on the Lean IMO benchmark from under 10% to 70%.
Link – arxiv. org/abs/2606.03303
Title: “LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks”
Similar Articles
LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks
LEAP is an agentic framework that enables general-purpose LLMs to achieve state-of-the-art performance in formal theorem proving in Lean, solving all 12 problems from the 2025 Putnam Competition and boosting formal solve rates from below 10% to 70% on a new benchmark (Lean-IMO-Bench), surpassing specialized systems.
@FinanceYF5: Google new paper: Let LLM solve math competition problems, accuracy jumps from 10% to 70%. [LEAP framework] Instead of having the model write a complete proof at once, it breaks down the problem into a goal tree, learns step by step from Lean verifier feedback, and reuses proven lemmas. Result: All 12 problems of Putnam 2025 solved, IMO style…
Google new paper proposes the LEAP framework, which decomposes math problems into goal trees, learns from Lean verifier feedback, and improves LLM accuracy on math competition problems from 10% to 70%. It solves all 12 problems of Putnam 2025 and surpasses dedicated gold-medal-level systems on IMO-style benchmarks.
@rohanpaul_ai: Google DeepMind's new paper. Shows that AI can now search formal mathematics proofs, but only inside carefully constrai…
Google DeepMind's new paper introduces AlphaProof Nexus, an AI system that combines an LLM with the Lean proof checker to search for formal proofs in constrained mathematical domains. The system solves several unsolved problems from the Erdős and OEIS sets, demonstrating a new division of labor where the AI proposes proof candidates and the verifier enforces correctness.
@Raytar: a Google researcher walked into MIT and made an AI do math correctly by adding seven words to the prompt. the seven wor…
A thread highlights two separate insights: a Google researcher found that adding 'you are an MIT mathematician' to a prompt fixes math errors in LLMs, and Alex Albert explains how Anthropic trains Claude's personality. Both resources are free and offer deep dives into how LLMs actually work.
@logic_int: NEW: Aleph Prover has formalized OpenAI’s disproof of Paul Erdős’ planar unit problem. We are releasing the formalizati…
Aleph Prover has formalized OpenAI's disproof of Paul Erdős' planar unit problem in Lean 4 and released it as open source for independent validation, demonstrating AI's role in accelerating mathematical research with verifiable proof data.