mathematical-proofs

#mathematical-proofs

Mask-Proof: An LLM-based Automated Data Curation Pipeline on Mathematical Proofs

arXiv cs.AI ↗ · 2026-06-16 Cached

Introduces Mask-Proof, an LLM-based pipeline that converts mathematical proofs into masked-step tasks for automated evaluation, and presents MaskProofBench, a benchmark of 292 curated problems achieving 96.8% agreement with expert annotators.

0 favorites 0 likes

#mathematical-proofs

Evaluating the Robustness of Proof Autoformalization in Lean 4

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper evaluates the robustness of proof autoformalization models in Lean 4 under global and local perturbations, finding that current LLM-based models are sensitive to perturbations and often fail to faithfully reflect local changes.

0 favorites 0 likes

#mathematical-proofs

Evaluating Research-Level Math Proofs via Strict Step-Level Verification

arXiv cs.AI ↗ · 2026-06-10 Cached

This paper introduces a strict step-level verification framework for evaluating research-level mathematical proofs using LLMs, addressing context poisoning and outperforming global evaluation. The approach shifts focus to deductive constraints and reveals that remaining errors are often due to pedantic hyper-rigor, exposing implicit ambiguities in benchmarks.

0 favorites 0 likes

#mathematical-proofs

Our First Proof submissions

OpenAI Blog ↗ · 2026-02-20 Cached

OpenAI submitted proof attempts for the First Proof challenge, a research-level math competition testing whether AI can produce correct, checkable proofs. The company's internal model successfully solved at least five of the ten problems, demonstrating significant progress in sustained reasoning and rigorous mathematical thinking.

0 favorites 0 likes

mathematical-proofs

Mask-Proof: An LLM-based Automated Data Curation Pipeline on Mathematical Proofs

Evaluating the Robustness of Proof Autoformalization in Lean 4

Evaluating Research-Level Math Proofs via Strict Step-Level Verification

Our First Proof submissions

Submit Feedback