theorem-proving

Tag

Cards List
#theorem-proving

Process-Verified Reinforcement Learning for Theorem Proving via Lean

arXiv cs.AI · 5d ago Cached

This paper presents Process-Verified Reinforcement Learning, using the Lean proof assistant as a process oracle to provide fine-grained tactic-level feedback during training, improving theorem proving performance.

0 favorites 0 likes
#theorem-proving

IsabeLLM: Automated Theorem Proving Applied to Formally Verifying Consensus

arXiv cs.AI · 2026-06-17 Cached

This paper presents improvements to IsabeLLM, an automated theorem proving tool built on Isabelle, by integrating a retrieval-augmented generation framework, error tracing, and counterexample generation. The improved tool is evaluated on the formal verification of Bitcoin's Proof of Work consensus protocol.

0 favorites 0 likes
#theorem-proving

Formalizing Numerical Analysis: An Agent Pipeline and Quality Audit Beyond Kernel Acceptance

arXiv cs.AI · 2026-06-15 Cached

This paper presents an agent pipeline for formalizing a numerical analysis textbook in Lean 4 and introduces a quality audit framework that evaluates semantic correctness and library reuse beyond kernel acceptance, revealing common unfaithful formalization patterns.

0 favorites 0 likes
#theorem-proving

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

arXiv cs.AI · 2026-06-15 Cached

This paper presents a case study of using a large language model (Claude Code) to formalize Grothendieck's vanishing theorem in the Lean theorem prover. It finds that while agents can produce verified code, they struggle with definitions and API design, emphasizing the need for expert review beyond mere compilation.

0 favorites 0 likes
#theorem-proving

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

arXiv cs.AI · 2026-06-15 Cached

MA-ProofBench is a new formal benchmark for evaluating LLMs on theorem proving in mathematical analysis, containing 200 problems across two difficulty levels. The best model, GPT-5.5, achieves only 16% on Level I and 5% on Level II, highlighting a significant gap between informal and formal reasoning.

0 favorites 0 likes
#theorem-proving

ATS Programming Language

Lobsters Hottest · 2026-06-09 Cached

ATS is a statically typed programming language that unifies implementation with formal specification, supporting functional, imperative, concurrent, and modular programming with dependent and linear types for high efficiency and safety.

0 favorites 0 likes
#theorem-proving

@Gilad_Bracha: The future role of the software engineer is using AI to translate informal requirements into high level formal specs, a…

X AI KOLs Following · 2026-06-05 Cached

Gilad Bracha envisions a future where software engineers use AI to translate informal requirements into formal specs and review them, while AI implements and verifies code. The human ensures the formal spec is correct, writing only natural language.

0 favorites 0 likes
#theorem-proving

LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

arXiv cs.AI · 2026-06-03 Cached

LEAP is an agentic framework that enables general-purpose LLMs to achieve state-of-the-art performance in formal theorem proving in Lean, solving all 12 problems from the 2025 Putnam Competition and boosting formal solve rates from below 10% to 70% on a new benchmark (Lean-IMO-Bench), surpassing specialized systems.

0 favorites 0 likes
#theorem-proving

Distilling LLM Feedback for Lean Theorem Proving

arXiv cs.AI · 2026-06-01 Cached

Proposes Feedback Distillation, a training method that uses token-level supervision from an LLM to improve complex reasoning, evaluated on Lean 4 theorem proving. It maintains diversity better than GRPO and the two methods are complementary.

0 favorites 0 likes
#theorem-proving

@rohanpaul_ai: “I do see more and more mass-produced mathematics at scale." ~ Terry Tao AI makes this scalable. Will turns proof-writi…

X AI KOLs Following · 2026-05-24 Cached

Terry Tao remarks on AI enabling mass-produced mathematics at scale, turning proof-writing into a searchable problem that generates thousands of mini-lemmas and filters them with cheap checkers.

0 favorites 0 likes
#theorem-proving

Announcing Isabelle support for SAW

Lobsters Hottest · 2026-05-22 Cached

Galois announces that SAW now supports generating Isabelle theories from Cryptol specifications, bridging the usability of Cryptol and SAW with the expressivity of interactive theorem provers like Isabelle, enabling semi-automated verification of cryptographic protocols.

0 favorites 0 likes
#theorem-proving

@DayShuai: Tomorrow I'll volunteer to share my own AI loop at the Yang Zhang lab group meeting. The same OS pattern has run out 3,400+ 0-axiom Lean 4 theorems on automath and newmath in the past six months, with 5×/week automatic releases,...

X AI KOLs Timeline · 2026-05-19 Cached

Sharing experience from the AI loop at the Yang Zhang lab group meeting, including automated theorem proving, multi-machine collaboration, distilling a private experience base, and mentioning examples of Fields medalists using AI to solve mathematical problems.

0 favorites 0 likes
#theorem-proving

Self-play helped AI achieve superhuman performance in Go, so why hasn’t it done the same for LLMs? Researchers have found a solution.

Reddit r/singularity · 2026-05-15

Researchers introduce Self-Guided Self-Play (SGS), a self-play algorithm for LLMs that prevents reward hacking by using a Guide role to score synthetic problems. Applied to theorem proving in Lean4, SGS surpasses RL baselines and allows a 7B model to outperform a 671B model.

0 favorites 0 likes
#theorem-proving

@logic_int: Aleph, our fully autonomous AI agent system for formal verification, aced all major theorem proving benchmarks includin…

X AI KOLs Following · 2026-05-14 Cached

Aleph, a fully autonomous AI agent system for formal verification, achieved top performance on major theorem proving benchmarks including PutnamBench, VeriSoftBench, and Verina.

0 favorites 0 likes
#theorem-proving

Formalizing statistical learning theory in Lean 4 [R]

Reddit r/MachineLearning · 2026-05-08 Cached

FormalSLT is a Lean 4 library that formally proves finite-sample statistical learning theory results (ERM, VC bounds, Rademacher bounds, PAC-Bayes, etc.) with explicit assumptions and zero sorry statements, providing a machine-checked foundation for ML theory.

0 favorites 0 likes
#theorem-proving

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI

Hugging Face Daily Papers · 2026-05-07 Cached

This paper introduces the AI Co-Mathematician, a workbench that uses agentic AI to support mathematicians in open-ended research tasks like ideation and theorem proving. Early tests show the system achieving state-of-the-art results on hard problem-solving benchmarks, including a 48% score on FrontierMath Tier 4.

0 favorites 0 likes
#theorem-proving

Bolzano: Case Studies in LLM-Assisted Mathematical Research

arXiv cs.CL · 2026-04-21 Cached

Researchers from Charles University introduce Bolzano, an open-source multi-agent LLM system that orchestrates prover and verifier agents to assist with mathematical research, reporting new results on six problems where four reached publishable quality and three were produced essentially autonomously.

0 favorites 0 likes
#theorem-proving

Verus is a tool for verifying the correctness of code written in Rust

Hacker News Top · 2026-04-20 Cached

Verus is a static verification tool for Rust that uses SMT solving to prove full functional correctness of low-level systems code without runtime checks.

0 favorites 0 likes
#theorem-proving

Learning to Reason with Insight for Informal Theorem Proving

arXiv cs.CL · 2026-04-20 Cached

This paper proposes DeepInsightTheorem, a hierarchical dataset and Progressive Multi-Stage SFT training strategy to improve LLMs' informal theorem proving by teaching them to identify and apply core techniques through insight-aware reasoning.

0 favorites 0 likes
#theorem-proving

Advanced Gemini with Deep Think Achieves Gold Medal Standard at International Mathematical Olympiad

Google DeepMind Blog · 2025-10-24 Cached

Google DeepMind's advanced Gemini with Deep Think achieved gold-medal standard at the International Mathematical Olympiad 2025, solving 5 out of 6 problems for 35 points—a significant advance over last year's silver-medal performance, operating end-to-end in natural language within competition time limits.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback