research-level-math

#research-level-math

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Hugging Face Daily Papers ↗ · 5d ago Cached

Soohak is a new benchmark of 439 research-level math problems curated by mathematicians to evaluate the reasoning capabilities of frontier LLMs, highlighting significant gaps in solving advanced problems and recognizing ill-posed questions.

0 favorites 0 likes

research-level-math

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Submit Feedback