TheoremGraph: Bridging Formal and Informal Mathematics

Hugging Face Daily Papers Papers

Summary

TheoremGraph is a unified statement-level dependency graph that spans both informal mathematics (arXiv papers) and formal mathematics (Lean projects), using semantic embeddings to bridge the gap between them. The authors provide datasets, extractors, and APIs to support mathematical search and retrieval.

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level dependency graph spanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we release LeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generated natural-language slogans into a shared semantic space, linking related statements across papers and across the informal/formal divide; an LLM judge affirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formal concept retrieval, our name-and-signature representation with graph expansion comes within 0.5pp of LeanSearch v2's reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.
Original Article
View Cached Full Text

Cached at: 06/30/26, 07:34 AM

Paper page - TheoremGraph: Bridging Formal and Informal Mathematics

Source: https://huggingface.co/papers/2606.25363

Abstract

A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects.

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly:informal paperscite mostly at the document level, whileformal librariesrecord fine-grained dependencies over a much smaller body of mathematics. We introduceTheoremGraph, a unifiedstatement-level dependency graphspanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we releaseLeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generatednatural-language slogansinto a sharedsemantic space, linking related statements across papers and across the informal/formal divide; anLLM judgeaffirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formalconcept retrieval, our name-and-signature representation withgraph expansioncomes within 0.5pp ofLeanSearchv2’s reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.

View arXiv pageView PDFProject pageAdd to collection

Get this paper in your agent:

hf papers read 2606\.25363

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.25363 in a model README.md to link it from this page.

Datasets citing this paper1

#### uw-math-ai/math-graph Viewer• Updated1 day ago • 16.1M • 75 • 2

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.25363 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

MathAtlas: A Benchmark for Autoformalization in the Wild

arXiv cs.AI

MathAtlas is a large-scale benchmark for autoformalization of graduate-level mathematics, containing ~52k theorems and definitions extracted from 103 textbooks, with a mathematical dependency graph of ~178k relations. Experiments show state-of-the-art models achieve at most 9.8% correctness, highlighting the difficulty.

Learning to Reason with Insight for Informal Theorem Proving

arXiv cs.CL

This paper proposes DeepInsightTheorem, a hierarchical dataset and Progressive Multi-Stage SFT training strategy to improve LLMs' informal theorem proving by teaching them to identify and apply core techniques through insight-aware reasoning.