ATLAS: Autoformalized Textbook Library At Scale
Summary
ATLAS is a large-scale Lean 4 library of textbook mathematics autoformalized by LLMs, covering 26 books with over 46,000 declarations. It provides reusable formal building blocks for human and machine-driven formalization.
View Cached Full Text
Cached at: 05/29/26, 07:21 PM
facebookresearch/atlas-lean
Source: https://github.com/facebookresearch/atlas-lean
ATLAS - Autoformalized Textbook Library At Scale
ATLAS is a Lean 4 library of textbook mathematics autoformalized by LLMs: informal statements and proofs translated into Lean code. It draws from undergraduate and graduate textbooks across analysis, algebra, geometry, topology, combinatorics, probability, statistics, PDEs, number theory, and theoretical computer science.
The goal of ATLAS is to provide reusable formal building blocks for future human- and machine-driven Lean formalization. This is an active effort: we are continuing to scale to more sources, curate the generated material, improve coverage and maintainability, and move the library closer to Mathlib conventions.
ATLAS was generated with AutoformBot, our autoformalization pipeline.
Links
- Companion paper: Formalizing Mathematics at Scale
- Formalization harness: https://github.com/facebookresearch/autoform-bot
- Earlier related work: Formalization of Algebraic Combinatorics
Library Data
Each book directory under Atlas/ contains:
- Lean source files for the generated definitions, statements, and proofs.
targets.yaml, listing the textbook statements selected for formalization.report.json, containing automated evaluation results for matched Lean declarations, including faithfulness, proof-integrity, and code-quality scores.
Visualizer
A visualizer is provided at https://rammalahmad.github.io/atlas/. It allows users to browse ATLAS, compare informal statements with their Lean formalizations, inspect logical dependency graphs between results, and extract the Lean code needed to state a selected theorem.

Status and Contributions
ATLAS is an ongoing, machine-generated extension effort rather than a finished product. We are actively working on scaling the corpus, curating the generated code, formalizing remaining statements, and improving idiomatic Mathlib reuse. External contributions are welcome!
To build the full library with the pinned Lean and Mathlib versions, run:
lake build
Statistics (May 2026)
26 books · 630,999 lines of code · 483,917 lines of Lean code (excl. comments/blanks) · 46,203 declarations · 42,837 proved (92.7%) · 2,855 / 4,007 statements formalized (71.3%) · 183,157M tokens
| Book | Target Statements | Formalized | % Formalized | Lines of Code | Lines of Lean | Declarations | Proved | % Proved | Tokens (M) |
|---|---|---|---|---|---|---|---|---|---|
| AlgebraNotes | 176 | 151 | 85.8% | 5,037 | 4,409 | 274 | 261 | 95.3% | 1,962.99 |
| AlgebraicCombinatorics | 39 | 37 | 94.9% | 10,695 | 9,343 | 737 | 734 | 99.6% | 1,440.73 |
| AlgebraicGeometryI | 186 | 112 | 60.2% | 40,678 | 27,393 | 4,499 | 4,210 | 93.6% | 7,629.26 |
| AlgebraicTopologyI | 171 | 110 | 64.3% | 29,154 | 20,142 | 2,416 | 2,063 | 85.4% | 10,323.27 |
| AnAlgorithmistsToolkit | 158 | 131 | 82.9% | 9,656 | 8,234 | 712 | 668 | 93.8% | 2,004.00 |
| ArithmeticGeometry | 335 | 266 | 79.4% | 39,257 | 29,573 | 3,047 | 2,861 | 93.9% | 11,100.62 |
| BooleanFunctions | 108 | 44 | 40.7% | 9,516 | 7,949 | 667 | 614 | 92.1% | 2,327.49 |
| Buildings | 74 | 44 | 59.5% | 64,383 | 48,809 | 4,345 | 4,247 | 97.7% | 20,442.93 |
| CombinatorialOptimization | 36 | 22 | 61.1% | 8,908 | 7,934 | 428 | 414 | 96.7% | 2,475.65 |
| ComplexVariables | 38 | 37 | 97.4% | 7,231 | 6,225 | 285 | 280 | 98.2% | 1,250.91 |
| DifferentialAnalysis | 113 | 88 | 77.9% | 31,302 | 23,713 | 1,634 | 1,506 | 92.2% | 11,743.27 |
| DifferentialGeometry | 147 | 112 | 76.2% | 10,592 | 8,942 | 888 | 781 | 88.0% | 1,933.97 |
| EllipticCurves | 360 | 212 | 58.9% | 32,819 | 22,316 | 3,483 | 2,981 | 85.6% | 11,058.00 |
| FourierAnalysis | 38 | 34 | 89.5% | 7,943 | 6,671 | 373 | 359 | 96.2% | 1,185.90 |
| GeometryOfManifolds | 72 | 40 | 55.6% | 22,686 | 16,408 | 3,251 | 3,098 | 95.3% | 6,864.93 |
| HighDimensionalStatistics | 73 | 65 | 89.0% | 39,656 | 31,715 | 1,564 | 1,518 | 97.1% | 975.36 |
| IntroductionToFunctionalAnalysis | 72 | 68 | 94.4% | 2,709 | 2,006 | 113 | 109 | 96.5% | 553.64 |
| IntroductionToPartialDifferentialEquations | 105 | 86 | 81.9% | 27,666 | 20,740 | 1,585 | 1,414 | 89.2% | 2,972.23 |
| LieGroups | 185 | 74 | 40.0% | 60,285 | 50,594 | 4,219 | 3,814 | 90.4% | 45,384.33 |
| NumberTheoryI | 576 | 460 | 79.9% | 64,958 | 54,760 | 3,764 | 3,591 | 95.4% | 15,424.36 |
| ProbabilisticMethodsInCombinatorics | 210 | 109 | 51.9% | 20,555 | 15,604 | 1,272 | 1,089 | 85.6% | 2,720.15 |
| ProjectionTheory | 111 | 73 | 65.8% | 13,357 | 9,672 | 979 | 871 | 89.0% | 2,678.00 |
| RealAnalysis | 177 | 175 | 98.9% | 2,886 | 2,224 | 149 | 147 | 98.7% | 585.64 |
| TensorCategories | 229 | 137 | 59.8% | 42,812 | 29,729 | 3,373 | 3,176 | 94.2% | 11,338.45 |
| TheoryOfComputation | 118 | 84 | 71.2% | 15,094 | 10,581 | 1,553 | 1,482 | 95.4% | 3,580.36 |
| TheoryOfProbability | 100 | 84 | 84.0% | 11,164 | 8,231 | 593 | 549 | 92.6% | 3,200.61 |
| Total | 4,007 | 2,855 | 71.3% | 630,999 | 483,917 | 46,203 | 42,837 | 92.7% | 183,157 |
Contributors
The initial ATLAS effort was led by Ahmad Rammal, Niket Patel, Fabian Gloeckle, Amaury Hayat, Julia Kempe, Remi Munos, Charles Arnal, and Vivien Cabannes.
Citation
If you find this work useful, please cite our paper:
@misc{rammal2026formalizingmathematicsscale,
title={Formalizing Mathematics at Scale},
author={Ahmad Rammal and Niket Patel and Fabian Gloeckle and Amaury Hayat and Julia Kempe and Remi Munos and Charles Arnal and Vivien Cabannes},
year={2026},
eprint={2605.29955},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.29955},
}
Similar Articles
MathAtlas: A Benchmark for Autoformalization in the Wild
MathAtlas is a large-scale benchmark for autoformalization of graduate-level mathematics, containing ~52k theorems and definitions extracted from 103 textbooks, with a mathematical dependency graph of ~178k relations. Experiments show state-of-the-art models achieve at most 9.8% correctness, highlighting the difficulty.
A Formally Verified Library of Mathematical Finance in Lean 4
This paper describes a formally verified library of mathematical finance in Lean 4, containing over 200 theorems covering measure-theoretic foundations through derivative pricing, and includes a faithfulness audit to classify results by how their Lean statement relates to the claimed mathematics.
TabularMath: Understanding Math Reasoning over Tables with Large Language Models
TabularMath introduces a benchmark and AutoT2T framework for evaluating LLMs' mathematical reasoning over tabular data, revealing that table complexity, data quality, and modality significantly impact model performance. The study addresses a gap in LLM evaluation by systematically assessing robustness to incomplete or inconsistent table information in real-world scenarios.
DocAtlas: Multilingual Document Understanding Across 80+ Languages
DocAtlas is a framework that creates high-fidelity OCR datasets and benchmarks across 82 languages, using differential rendering and synthetic generation. It demonstrates that Direct Preference Optimization improves multilingual model adaptation without degrading base-language performance.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
This paper introduces AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies for LLMs by formulating it as controller synthesis. It demonstrates improved accuracy-cost tradeoffs on mathematical reasoning benchmarks with minimal computational overhead.