Tag
Ego2World converts egocentric cooking videos (HD-EPIC) into executable symbolic worlds with graph-transition rules, enabling evaluation of belief-state planning under partial observation. Experiments show that belief memory improves task completion, suggesting it should be a first-class target in embodied agent evaluation.
Proposes VeGAS, a test-time framework for MLLM-based embodied agents that samples multiple candidate actions and uses a generative verifier to select the most reliable, achieving up to 36% relative improvement over CoT baselines on challenging tasks.
The paper introduces 'Continual Harness,' a framework enabling embodied AI agents to self-improve online without environment resets. It demonstrates significant progress in playing Pokémon games, achieving human-level performance through automated prompt and skill refinement.