Tag
AdaPlanBench is a dynamic benchmark for evaluating LLM agents' ability to adaptively plan under progressively revealed world and user constraints through multi-turn interactions, showing current models struggle especially with user constraints.
Dr. Fei-Fei Li discusses the challenges robots face in understanding and executing everyday household tasks, highlighting the difficulty of grounding natural language instructions like 'open the drawer while avoiding the vase' into robot actions.