Tag
Discusses a limitation of current agent benchmarks that assume the world changes only when the agent acts, whereas many real-world tasks require the agent to wait for external events before acting.