Tag
This paper introduces Vigil, an evaluation framework for embodied agents that disentangles task execution success from the agent's ability to correctly recognize and report task completion.