ATLAS: Autoformalized Textbook Library At Scale

Hacker News Top 05/28/26, 04:40 PM Tools

autoformalization lean4 mathematics open-source facebook-research llm formal-verification

Summary

ATLAS is a large-scale Lean 4 library of textbook mathematics autoformalized by LLMs, covering 26 books with over 46,000 declarations. It provides reusable formal building blocks for human and machine-driven formalization.

<a href="https://twitter.com/arnal_charles/status/2060009395107377282" rel="nofollow">https://twitter.com/arnal_charles/status/2060009395107377282</a>, <a href="https://xcancel.com/arnal_charles/status/2060009395107377282" rel="nofollow">https://xcancel.com/arnal_charles/status/2060009395107377282</a><p>Paper: <i>Formalizing Mathematics at Scale</i> - <a href="https://arxiv.org/abs/2605.29955" rel="nofollow">https://arxiv.org/abs/2605.29955</a>

Original Article

View Cached Full Text

Cached at: 05/29/26, 07:21 PM

facebookresearch/atlas-lean

Source: https://github.com/facebookresearch/atlas-lean

ATLAS - Autoformalized Textbook Library At Scale

ATLAS is a Lean 4 library of textbook mathematics autoformalized by LLMs: informal statements and proofs translated into Lean code. It draws from undergraduate and graduate textbooks across analysis, algebra, geometry, topology, combinatorics, probability, statistics, PDEs, number theory, and theoretical computer science.

The goal of ATLAS is to provide reusable formal building blocks for future human- and machine-driven Lean formalization. This is an active effort: we are continuing to scale to more sources, curate the generated material, improve coverage and maintainability, and move the library closer to Mathlib conventions.

ATLAS was generated with AutoformBot, our autoformalization pipeline.

Library Data

Each book directory under Atlas/ contains:

Lean source files for the generated definitions, statements, and proofs.
targets.yaml, listing the textbook statements selected for formalization.
report.json, containing automated evaluation results for matched Lean declarations, including faithfulness, proof-integrity, and code-quality scores.

Visualizer

A visualizer is provided at https://rammalahmad.github.io/atlas/. It allows users to browse ATLAS, compare informal statements with their Lean formalizations, inspect logical dependency graphs between results, and extract the Lean code needed to state a selected theorem.

ATLAS visualizer

Status and Contributions

ATLAS is an ongoing, machine-generated extension effort rather than a finished product. We are actively working on scaling the corpus, curating the generated code, formalizing remaining statements, and improving idiomatic Mathlib reuse. External contributions are welcome!

To build the full library with the pinned Lean and Mathlib versions, run:

lake build

Statistics (May 2026)

26 books · 630,999 lines of code · 483,917 lines of Lean code (excl. comments/blanks) · 46,203 declarations · 42,837 proved (92.7%) · 2,855 / 4,007 statements formalized (71.3%) · 183,157M tokens

Book	Target Statements	Formalized	% Formalized	Lines of Code	Lines of Lean	Declarations	Proved	% Proved	Tokens (M)
AlgebraNotes	176	151	85.8%	5,037	4,409	274	261	95.3%	1,962.99
AlgebraicCombinatorics	39	37	94.9%	10,695	9,343	737	734	99.6%	1,440.73
AlgebraicGeometryI	186	112	60.2%	40,678	27,393	4,499	4,210	93.6%	7,629.26
AlgebraicTopologyI	171	110	64.3%	29,154	20,142	2,416	2,063	85.4%	10,323.27
AnAlgorithmistsToolkit	158	131	82.9%	9,656	8,234	712	668	93.8%	2,004.00
ArithmeticGeometry	335	266	79.4%	39,257	29,573	3,047	2,861	93.9%	11,100.62
BooleanFunctions	108	44	40.7%	9,516	7,949	667	614	92.1%	2,327.49
Buildings	74	44	59.5%	64,383	48,809	4,345	4,247	97.7%	20,442.93
CombinatorialOptimization	36	22	61.1%	8,908	7,934	428	414	96.7%	2,475.65
ComplexVariables	38	37	97.4%	7,231	6,225	285	280	98.2%	1,250.91
DifferentialAnalysis	113	88	77.9%	31,302	23,713	1,634	1,506	92.2%	11,743.27
DifferentialGeometry	147	112	76.2%	10,592	8,942	888	781	88.0%	1,933.97
EllipticCurves	360	212	58.9%	32,819	22,316	3,483	2,981	85.6%	11,058.00
FourierAnalysis	38	34	89.5%	7,943	6,671	373	359	96.2%	1,185.90
GeometryOfManifolds	72	40	55.6%	22,686	16,408	3,251	3,098	95.3%	6,864.93
HighDimensionalStatistics	73	65	89.0%	39,656	31,715	1,564	1,518	97.1%	975.36
IntroductionToFunctionalAnalysis	72	68	94.4%	2,709	2,006	113	109	96.5%	553.64
IntroductionToPartialDifferentialEquations	105	86	81.9%	27,666	20,740	1,585	1,414	89.2%	2,972.23
LieGroups	185	74	40.0%	60,285	50,594	4,219	3,814	90.4%	45,384.33
NumberTheoryI	576	460	79.9%	64,958	54,760	3,764	3,591	95.4%	15,424.36
ProbabilisticMethodsInCombinatorics	210	109	51.9%	20,555	15,604	1,272	1,089	85.6%	2,720.15
ProjectionTheory	111	73	65.8%	13,357	9,672	979	871	89.0%	2,678.00
RealAnalysis	177	175	98.9%	2,886	2,224	149	147	98.7%	585.64
TensorCategories	229	137	59.8%	42,812	29,729	3,373	3,176	94.2%	11,338.45
TheoryOfComputation	118	84	71.2%	15,094	10,581	1,553	1,482	95.4%	3,580.36
TheoryOfProbability	100	84	84.0%	11,164	8,231	593	549	92.6%	3,200.61
Total	4,007	2,855	71.3%	630,999	483,917	46,203	42,837	92.7%	183,157

Contributors

The initial ATLAS effort was led by Ahmad Rammal, Niket Patel, Fabian Gloeckle, Amaury Hayat, Julia Kempe, Remi Munos, Charles Arnal, and Vivien Cabannes.

Citation

If you find this work useful, please cite our paper:

@misc{rammal2026formalizingmathematicsscale,
      title={Formalizing Mathematics at Scale}, 
      author={Ahmad Rammal and Niket Patel and Fabian Gloeckle and Amaury Hayat and Julia Kempe and Remi Munos and Charles Arnal and Vivien Cabannes},
      year={2026},
      eprint={2605.29955},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.29955}, 
}

ATLAS: Autoformalized Textbook Library At Scale

facebookresearch/atlas-lean

ATLAS - Autoformalized Textbook Library At Scale

Links

Library Data

Visualizer

Status and Contributions

Statistics (May 2026)

Contributors

Citation

Similar Articles

MathAtlas: A Benchmark for Autoformalization in the Wild

A Formally Verified Library of Mathematical Finance in Lean 4

TabularMath: Understanding Math Reasoning over Tables with Large Language Models

DocAtlas: Multilingual Document Understanding Across 80+ Languages

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Submit Feedback

Similar Articles

MathAtlas: A Benchmark for Autoformalization in the Wild

A Formally Verified Library of Mathematical Finance in Lean 4

TabularMath: Understanding Math Reasoning over Tables with Large Language Models

DocAtlas: Multilingual Document Understanding Across 80+ Languages

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling