Tag
Soohak is a new benchmark of 439 research-level math problems curated by mathematicians to evaluate the reasoning capabilities of frontier LLMs, highlighting significant gaps in solving advanced problems and recognizing ill-posed questions.