ill-posed-problems

Tag

Cards List
#ill-posed-problems

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Hugging Face Daily Papers · 5d ago Cached

Soohak is a new benchmark of 439 research-level math problems curated by mathematicians to evaluate the reasoning capabilities of frontier LLMs, highlighting significant gaps in solving advanced problems and recognizing ill-posed questions.

0 favorites 0 likes
← Back to home

Submit Feedback