Tag
Affordance20Q is a benchmark that evaluates LLMs' ability to reason about object affordances from physical properties without revealing object identity, using a 20-Questions format. Experiments show a ~20 point gap between LLMs and humans, and a proposed pipeline KARI improves open-source LLMs by up to 15.2 points.
This paper introduces a deliberative curation protocol for multi-agent knowledge bases, addressing governance gaps such as agent statelessness and sycophancy. It evaluates the protocol via simulation, showing improved resilience under adversarial conditions.
DAIR Academy announces a free live session on building visual LLM artifacts to make LLM knowledge bases more actionable, with updates on new tools and releases for Pro members.
DeepRefine is a research paper introducing an LLM-based reasoning model that refines agent-compiled knowledge bases using reinforcement learning and multi-turn interactions to improve downstream task performance.
DAIR Academy is hosting a free live session on May 21, 2026, demonstrating a framework for building visual LLM artifacts to enhance knowledge bases.