Tag
Aleph Prover has formalized OpenAI's disproof of Paul Erdős' planar unit problem in Lean 4 and released it as open source for independent validation, demonstrating AI's role in accelerating mathematical research with verifiable proof data.
This paper evaluates Claude Code in an agentic proving framework on the Clever benchmark for program verification, achieving over 98% success in specification generation and end-to-end verification, revealing that existing benchmarks may be insufficient for evaluating modern agentic provers.
OProver is a unified framework for agentic formal theorem proving in Lean 4 that iteratively improves proof generation through training with verified proofs and compiler feedback, achieving state-of-the-art results on multiple benchmarks.
TorchLean is a newly released Lean 4 framework that enables formal verification of neural network software, featuring typed tensors, verified autograd, PyTorch interoperability, and GPU execution. The release expands support to modern architectures like diffusion models, GPT-style transformers, and state-space models, bridging practical ML workflows with mathematical proof checking.
This paper introduces Discover and Prove (DAP), an open-source agentic framework for automated theorem proving in Lean 4 that tackles 'Hard Mode' problems where the answer must be discovered independently before formal proof construction. The work releases new Hard Mode benchmark variants and achieves state-of-the-art results while revealing a significant gap between LLM answer accuracy (>80%) and formal prover success (<10%).