Tag
FormalSLT is a Lean 4 library that formally proves finite-sample statistical learning theory results (ERM, VC bounds, Rademacher bounds, PAC-Bayes, etc.) with explicit assumptions and zero sorry statements, providing a machine-checked foundation for ML theory.
This paper introduces the AI Co-Mathematician, a workbench that uses agentic AI to support mathematicians in open-ended research tasks like ideation and theorem proving. Early tests show the system achieving state-of-the-art results on hard problem-solving benchmarks, including a 48% score on FrontierMath Tier 4.
Researchers from Charles University introduce Bolzano, an open-source multi-agent LLM system that orchestrates prover and verifier agents to assist with mathematical research, reporting new results on six problems where four reached publishable quality and three were produced essentially autonomously.
Verus is a static verification tool for Rust that uses SMT solving to prove full functional correctness of low-level systems code without runtime checks.
This paper proposes DeepInsightTheorem, a hierarchical dataset and Progressive Multi-Stage SFT training strategy to improve LLMs' informal theorem proving by teaching them to identify and apply core techniques through insight-aware reasoning.
Google DeepMind's advanced Gemini with Deep Think achieved gold-medal standard at the International Mathematical Olympiad 2025, solving 5 out of 6 problems for 35 points—a significant advance over last year's silver-medal performance, operating end-to-end in natural language within competition time limits.
OpenAI achieved a new state-of-the-art 41.2% on the miniF2F formal math olympiad benchmark using a technique called 'statement curriculum learning,' which iteratively trains a neural prover on proofs of increasing difficulty. The approach builds on iterative proof search and retraining over 8 iterations to significantly outperform the previous best of 29.3%.
OpenAI introduces GamePad, a learning environment for applying machine learning to theorem proving in the Coq proof assistant, enabling proof synthesis and training baseline models for tactic prediction and position evaluation tasks.