proof-verification

#proof-verification

Characterizing initial human-AI proof formalization workflows

arXiv cs.AI ↗ · 4d ago Cached

Researchers from Oxford, Cambridge, MIT, CMU and other institutions conduct a mixed-methods study examining how people integrate AI tools into mathematical proof formalization workflows, finding that participants generally achieve higher formalization accuracy with AI assistance while preferring to maintain high-level human control over the proof discovery process.

0 favorites 0 likes

#proof-verification

RMA: an Agentic System for Research-Level Mathematical Problems

arXiv cs.AI ↗ · 2026-05-25 Cached

Research Math Agents (RMA) is an agentic framework for automated reasoning on research-level mathematical problems, achieving state-of-the-art results on the First Proof benchmark by solving 8 out of 10 problems, outperforming strong baselines like GPT-5.2R and Aletheia.

0 favorites 0 likes

#proof-verification

Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism

Hugging Face Daily Papers ↗ · 2026-04-07 Cached

ProofGrid is a benchmark suite that evaluates LLM reasoning through machine-checkable proofs using minimal formal notation, with tasks in proof writing, checking, and gap-filling, revealing progress and remaining limits including epistemic instability.

0 favorites 0 likes

proof-verification

Characterizing initial human-AI proof formalization workflows

RMA: an Agentic System for Research-Level Mathematical Problems

Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism

Submit Feedback