Evaluating the Robustness of Proof Autoformalization in Lean 4

arXiv cs.CL 06/16/26, 04:00 AM Papers

autoformalization lean4 robustness llm mathematical-proofs formal-verification

Summary

This paper evaluates the robustness of proof autoformalization models in Lean 4 under global and local perturbations, finding that current LLM-based models are sensitive to perturbations and often fail to faithfully reflect local changes.

arXiv:2606.14867v1 Announce Type: new Abstract: Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.

Original Article

View Cached Full Text

Cached at: 06/16/26, 11:43 AM

# Evaluating the Robustness of Proof Autoformalization in Lean 4
Source: [https://arxiv.org/abs/2606.14867](https://arxiv.org/abs/2606.14867)
[View PDF](https://arxiv.org/pdf/2606.14867)

> Abstract:Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4\. Several works have developed LLM\-based models for proof autoformalization\. However, existing evaluations have typically focused on translating well\-formed informal proofs from curated datasets\. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models\. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own\. We build a benchmark with both perturbations on miniF2F and MATH\-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations\. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations\. Code and data are available via[this https URL](https://github.com/ucr-rai/robust-proof-autoformalization)\.

## Submission history

From: Zhouxing Shi \[[view email](https://arxiv.org/show-email/b3912353/2606.14867)\] **\[v1\]**Fri, 12 Jun 2026 18:10:21 UTC \(246 KB\)

Evaluating the Robustness of Proof Autoformalization in Lean 4

Similar Articles

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

Evaluating Research-Level Math Proofs via Strict Step-Level Verification

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

Using algebra and LLMs to verify a flight-plan bug fix in Lean

OProver: A Unified Framework for Agentic Formal Theorem Proving

Submit Feedback

Similar Articles

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

Evaluating Research-Level Math Proofs via Strict Step-Level Verification

Sorries Are Not the Hard Part: An Expert-Review Case Study of a Semi-Autonomous Formalization

Using algebra and LLMs to verify a flight-plan bug fix in Lean

OProver: A Unified Framework for Agentic Formal Theorem Proving