ATLAS: Autoformalized Textbook Library At Scale

Hacker News Top Tools

Summary

ATLAS is a large-scale Lean 4 library of textbook mathematics autoformalized by LLMs, covering 26 books with over 46,000 declarations. It provides reusable formal building blocks for human and machine-driven formalization.

<a href="https:&#x2F;&#x2F;twitter.com&#x2F;arnal_charles&#x2F;status&#x2F;2060009395107377282" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;arnal_charles&#x2F;status&#x2F;2060009395107377282</a>, <a href="https:&#x2F;&#x2F;xcancel.com&#x2F;arnal_charles&#x2F;status&#x2F;2060009395107377282" rel="nofollow">https:&#x2F;&#x2F;xcancel.com&#x2F;arnal_charles&#x2F;status&#x2F;2060009395107377282</a><p>Paper: <i>Formalizing Mathematics at Scale</i> - <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2605.29955" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2605.29955</a>
Original Article
View Cached Full Text

Cached at: 05/29/26, 07:21 PM

facebookresearch/atlas-lean

Source: https://github.com/facebookresearch/atlas-lean

ATLAS Logo

ATLAS - Autoformalized Textbook Library At Scale

ATLAS is a Lean 4 library of textbook mathematics autoformalized by LLMs: informal statements and proofs translated into Lean code. It draws from undergraduate and graduate textbooks across analysis, algebra, geometry, topology, combinatorics, probability, statistics, PDEs, number theory, and theoretical computer science.

The goal of ATLAS is to provide reusable formal building blocks for future human- and machine-driven Lean formalization. This is an active effort: we are continuing to scale to more sources, curate the generated material, improve coverage and maintainability, and move the library closer to Mathlib conventions.

ATLAS was generated with AutoformBot, our autoformalization pipeline.

Links

Library Data

Each book directory under Atlas/ contains:

  • Lean source files for the generated definitions, statements, and proofs.
  • targets.yaml, listing the textbook statements selected for formalization.
  • report.json, containing automated evaluation results for matched Lean declarations, including faithfulness, proof-integrity, and code-quality scores.

Visualizer

A visualizer is provided at https://rammalahmad.github.io/atlas/. It allows users to browse ATLAS, compare informal statements with their Lean formalizations, inspect logical dependency graphs between results, and extract the Lean code needed to state a selected theorem.

ATLAS visualizer

Status and Contributions

ATLAS is an ongoing, machine-generated extension effort rather than a finished product. We are actively working on scaling the corpus, curating the generated code, formalizing remaining statements, and improving idiomatic Mathlib reuse. External contributions are welcome!

To build the full library with the pinned Lean and Mathlib versions, run:

lake build

Statistics (May 2026)

26 books · 630,999 lines of code · 483,917 lines of Lean code (excl. comments/blanks) · 46,203 declarations · 42,837 proved (92.7%) · 2,855 / 4,007 statements formalized (71.3%) · 183,157M tokens

BookTarget StatementsFormalized% FormalizedLines of CodeLines of LeanDeclarationsProved% ProvedTokens (M)
AlgebraNotes17615185.8%5,0374,40927426195.3%1,962.99
AlgebraicCombinatorics393794.9%10,6959,34373773499.6%1,440.73
AlgebraicGeometryI18611260.2%40,67827,3934,4994,21093.6%7,629.26
AlgebraicTopologyI17111064.3%29,15420,1422,4162,06385.4%10,323.27
AnAlgorithmistsToolkit15813182.9%9,6568,23471266893.8%2,004.00
ArithmeticGeometry33526679.4%39,25729,5733,0472,86193.9%11,100.62
BooleanFunctions1084440.7%9,5167,94966761492.1%2,327.49
Buildings744459.5%64,38348,8094,3454,24797.7%20,442.93
CombinatorialOptimization362261.1%8,9087,93442841496.7%2,475.65
ComplexVariables383797.4%7,2316,22528528098.2%1,250.91
DifferentialAnalysis1138877.9%31,30223,7131,6341,50692.2%11,743.27
DifferentialGeometry14711276.2%10,5928,94288878188.0%1,933.97
EllipticCurves36021258.9%32,81922,3163,4832,98185.6%11,058.00
FourierAnalysis383489.5%7,9436,67137335996.2%1,185.90
GeometryOfManifolds724055.6%22,68616,4083,2513,09895.3%6,864.93
HighDimensionalStatistics736589.0%39,65631,7151,5641,51897.1%975.36
IntroductionToFunctionalAnalysis726894.4%2,7092,00611310996.5%553.64
IntroductionToPartialDifferentialEquations1058681.9%27,66620,7401,5851,41489.2%2,972.23
LieGroups1857440.0%60,28550,5944,2193,81490.4%45,384.33
NumberTheoryI57646079.9%64,95854,7603,7643,59195.4%15,424.36
ProbabilisticMethodsInCombinatorics21010951.9%20,55515,6041,2721,08985.6%2,720.15
ProjectionTheory1117365.8%13,3579,67297987189.0%2,678.00
RealAnalysis17717598.9%2,8862,22414914798.7%585.64
TensorCategories22913759.8%42,81229,7293,3733,17694.2%11,338.45
TheoryOfComputation1188471.2%15,09410,5811,5531,48295.4%3,580.36
TheoryOfProbability1008484.0%11,1648,23159354992.6%3,200.61
Total4,0072,85571.3%630,999483,91746,20342,83792.7%183,157

Contributors

The initial ATLAS effort was led by Ahmad Rammal, Niket Patel, Fabian Gloeckle, Amaury Hayat, Julia Kempe, Remi Munos, Charles Arnal, and Vivien Cabannes.

Citation

If you find this work useful, please cite our paper:

@misc{rammal2026formalizingmathematicsscale,
      title={Formalizing Mathematics at Scale}, 
      author={Ahmad Rammal and Niket Patel and Fabian Gloeckle and Amaury Hayat and Julia Kempe and Remi Munos and Charles Arnal and Vivien Cabannes},
      year={2026},
      eprint={2605.29955},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.29955}, 
}

Similar Articles

MathAtlas: A Benchmark for Autoformalization in the Wild

arXiv cs.AI

MathAtlas is a large-scale benchmark for autoformalization of graduate-level mathematics, containing ~52k theorems and definitions extracted from 103 textbooks, with a mathematical dependency graph of ~178k relations. Experiments show state-of-the-art models achieve at most 9.8% correctness, highlighting the difficulty.

A Formally Verified Library of Mathematical Finance in Lean 4

Hugging Face Daily Papers

This paper describes a formally verified library of mathematical finance in Lean 4, containing over 200 theorems covering measure-theoretic foundations through derivative pricing, and includes a faithfulness audit to classify results by how their Lean statement relates to the claimed mathematics.

TabularMath: Understanding Math Reasoning over Tables with Large Language Models

arXiv cs.CL

TabularMath introduces a benchmark and AutoT2T framework for evaluating LLMs' mathematical reasoning over tabular data, revealing that table complexity, data quality, and modality significantly impact model performance. The study addresses a gap in LLM evaluation by systematically assessing robustness to incomplete or inconsistent table information in real-world scenarios.

DocAtlas: Multilingual Document Understanding Across 80+ Languages

Hugging Face Daily Papers

DocAtlas is a framework that creates high-fidelity OCR datasets and benchmarks across 82 languages, using differential rendering and synthetic generation. It demonstrates that Direct Preference Optimization improves multilingual model adaptation without degrading base-language performance.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Hugging Face Daily Papers

This paper introduces AutoTTS, an environment-driven framework that automates the discovery of test-time scaling strategies for LLMs by formulating it as controller synthesis. It demonstrates improved accuracy-cost tradeoffs on mathematical reasoning benchmarks with minimal computational overhead.