problem-recognition

#problem-recognition

KWBench: Measuring Unprompted Problem Recognition in Knowledge Work

Hugging Face Daily Papers ↗ · 2026-04-17 Cached

KWBench introduces a benchmark of 223 professional tasks to evaluate whether LLMs can recognize the underlying game-theoretic structure of a situation without prompting, finding that even the best model succeeds on only 27.9% of tasks. The benchmark targets unprompted problem recognition—a step prior to task execution—across domains like acquisitions, clinical pharmacy, and fraud analysis.

0 favorites 0 likes

problem-recognition

KWBench: Measuring Unprompted Problem Recognition in Knowledge Work

Submit Feedback