research-level-math

Tag

Cards List
#research-level-math

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Hugging Face Daily Papers · 5d ago Cached

Soohak is a new benchmark of 439 research-level math problems curated by mathematicians to evaluate the reasoning capabilities of frontier LLMs, highlighting significant gaps in solving advanced problems and recognizing ill-posed questions.

0 favorites 0 likes
← Back to home

Submit Feedback