MIT & the IMO released MathNet, the world’s largest dataset of International Math Olympiad problems & solutions. MathNet is 5x larger than previous datasets & is sourced from over 40 countries across 4 decades

Reddit r/LocalLLaMA Papers

Summary

MIT and the IMO release MathNet, a massive dataset of International Math Olympiad problems and solutions spanning 40 years and 40+ countries, 5x larger than prior datasets.

Hugging Face: [https://huggingface.co/datasets/ShadenA/MathNet](https://huggingface.co/datasets/ShadenA/MathNet) Paper: [https://mathnet.csail.mit.edu/paper.pdf](https://mathnet.csail.mit.edu/paper.pdf) Project page: [https://mathnet.csail.mit.edu/](https://mathnet.csail.mit.edu/) From MIT CSAIL on 𝕏: [https://x.com/MIT\_CSAIL/status/2046620592980262964](https://x.com/MIT_CSAIL/status/2046620592980262964)
Original Article

Similar Articles

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Hugging Face Daily Papers

MathNet is a large-scale multilingual multimodal benchmark of 30,676 Olympiad-level math problems spanning 47 countries and 17 languages, designed to evaluate mathematical reasoning and retrieval in generative and embedding-based models. Even state-of-the-art models like Gemini and GPT-5 struggle with the benchmark, highlighting significant room for improvement in mathematical AI.

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

arXiv cs.AI

Introduces CrowdMath, a dataset of 164 expert-annotated progress chains from the MIT PRIMES–AoPS CrowdMath program, capturing collaborative mathematical problem-solving. Benchmarks six frontier models, finding they achieve 83-88% accuracy on next-post prediction but only 0.42 macro-F1 on post-role classification, highlighting a gap in understanding collaborative progress.

MathAtlas: A Benchmark for Autoformalization in the Wild

arXiv cs.AI

MathAtlas is a large-scale benchmark for autoformalization of graduate-level mathematics, containing ~52k theorems and definitions extracted from 103 textbooks, with a mathematical dependency graph of ~178k relations. Experiments show state-of-the-art models achieve at most 9.8% correctness, highlighting the difficulty.

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

arXiv cs.AI

VAMPS is a new benchmark of 1,168 multimodal bilingual math problems designed to evaluate whether LLMs can benefit from constructing and reasoning over graphs/visualizations. Key finding: direct analytical solving surprisingly outperforms tool-enabled visual solving even on problems where plotting is a natural strategy.