TheoremGraph: Bridging Formal and Informal Mathematics

Hugging Face Daily Papers 06/24/26, 12:00 AM Papers

formal-mathematics dependency-graph lean arxiv semantic-embedding retrieval math-ai

Summary

TheoremGraph is a unified statement-level dependency graph that spans both informal mathematics (arXiv papers) and formal mathematics (Lean projects), using semantic embeddings to bridge the gap between them. The authors provide datasets, extractors, and APIs to support mathematical search and retrieval.

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level dependency graph spanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we release LeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generated natural-language slogans into a shared semantic space, linking related statements across papers and across the informal/formal divide; an LLM judge affirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formal concept retrieval, our name-and-signature representation with graph expansion comes within 0.5pp of LeanSearch v2's reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.

Original Article

View Cached Full Text

Cached at: 06/30/26, 07:34 AM

Paper page - TheoremGraph: Bridging Formal and Informal Mathematics

Source: https://huggingface.co/papers/2606.25363

Abstract

A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects.

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly:informal paperscite mostly at the document level, whileformal librariesrecord fine-grained dependencies over a much smaller body of mathematics. We introduceTheoremGraph, a unifiedstatement-level dependency graphspanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we releaseLeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generatednatural-language slogansinto a sharedsemantic space, linking related statements across papers and across the informal/formal divide; anLLM judgeaffirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formalconcept retrieval, our name-and-signature representation withgraph expansioncomes within 0.5pp ofLeanSearchv2’s reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2606\.25363

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.25363 in a model README.md to link it from this page.

Datasets citing this paper1

#### uw-math-ai/math-graph Viewer• Updated1 day ago • 16.1M • 75 • 2

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.25363 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

TheoremGraph: Bridging Formal and Informal Mathematics

Paper page - TheoremGraph: Bridging Formal and Informal Mathematics

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

MathAtlas: A Benchmark for Autoformalization in the Wild

Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

It's great to see how automated theorem proving is moving from a niche tool to solving real math problems

Learning to Reason with Insight for Informal Theorem Proving

Submit Feedback

Similar Articles

MathAtlas: A Benchmark for Autoformalization in the Wild

Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

Beyond the Library: An Agentic Framework for Autoformalizing Research Mathematics

It's great to see how automated theorem proving is moving from a niche tool to solving real math problems

Learning to Reason with Insight for Informal Theorem Proving