Tag
Introduces Lean4Agent, a framework using Lean4 for formal modeling and verification of agent workflows and trajectories, demonstrating improved performance on SWE-Bench and ELAIP-Bench.
This paper describes a formally verified library of mathematical finance in Lean 4, containing over 200 theorems covering measure-theoretic foundations through derivative pricing, and includes a faithfulness audit to classify results by how their Lean statement relates to the claimed mathematics.
ATLAS is a large-scale Lean 4 library of textbook mathematics autoformalized by LLMs, covering 26 books with over 46,000 declarations. It provides reusable formal building blocks for human and machine-driven formalization.
ImProver 2 is a neurosymbolic framework for automated proof optimization in Lean 4 that uses an expert-iteration pipeline and a scaffold to train a 7B-parameter model, outperforming much larger models and demonstrating that small models can effectively restructure research-level proofs.
This paper introduces the ⋆_G tensor algebra, a framework that makes equivariance an intrinsic algebraic property rather than an architectural constraint, providing provably-optimal symmetry-preserving tensor approximation, Kronecker factorization for composing multiple symmetries, and a Lean 4 formalization. Experiments on QM9 molecular geometry demonstrate data-driven discovery of physical symmetry selection rules.
Sharing experience from the AI loop at the Yang Zhang lab group meeting, including automated theorem proving, multi-machine collaboration, distilling a private experience base, and mentioning examples of Fields medalists using AI to solve mathematical problems.
Researchers introduce Self-Guided Self-Play (SGS), a self-play algorithm for LLMs that prevents reward hacking by using a Guide role to score synthetic problems. Applied to theorem proving in Lean4, SGS surpasses RL baselines and allows a 7B model to outperform a 671B model.
This paper introduces Formal Conjectures, an evolving benchmark of 2615 mathematical statements formalized in Lean 4, including open research conjectures for proof discovery and solved problems for auto-formalization, designed to evaluate automated reasoning systems with zero contamination.
A technical blog post introduces a Lean4-to-TileLang tensor program superoptimizer that automatically generates optimized GPU/TPU kernels and hyperparameter scaling laws, demonstrating performance gains over torch.compile.
The author developed a Lean4-to-TileLang tensor program superoptimizer that automatically generates optimized accelerator kernels and derives hyperparameter scaling laws, achieving a 1.8x speedup on A100 GPUs.
FormalSLT is a Lean 4 library that formally proves finite-sample statistical learning theory results (ERM, VC bounds, Rademacher bounds, PAC-Bayes, etc.) with explicit assumptions and zero sorry statements, providing a machine-checked foundation for ML theory.