Tag
Affordance20Q is a benchmark that evaluates LLMs' ability to reason about object affordances from physical properties without revealing object identity, using a 20-Questions format. Experiments show a ~20 point gap between LLMs and humans, and a proposed pipeline KARI improves open-source LLMs by up to 15.2 points.