Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring
Summary
Hide-and-Seek is a framework that detects robot execution failures in VLA models by localizing failure-indicative actions through contrastive learning without step-level annotations, achieving state-of-the-art multi-task failure detection.
View Cached Full Text
Cached at: 06/01/26, 03:18 AM
Paper page - Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring
Source: https://huggingface.co/papers/2605.30834
Abstract
Hide-and-Seek framework detects robot execution failures in vision-language-action models by localizing failure-indicative actions through contrastive learning from trajectory-level supervision without step-level annotations.
Vision-Language-Action (VLA) models enable robots to follow natural language instructions and generalize across diverse tasks, but they remain vulnerable to execution failures that compromise reliability in real-world deployment. Detecting such failures during execution is therefore critical for the robust deployment of embodied systems. Existingfailure detectionmethods either rely on expensive action resampling or external models, while alternatives propagate trajectory-level labels uniformly across every timestep, obscuring localized failure signals. In this paper, we propose Hide-and-Seek, a framework that formulates VLAfailure detectionas acoarsely supervised learningproblem. By combining inter-trajectory andintra-trajectory contrastive objectives, Hide-and-Seek localizes failure-indicative actions and induces temporally structured failure signals from trajectory-level supervision alone, without any step-level annotation. We evaluate Hide-and-Seek on LIBERO, VLABench, and a real-world robotic platform across three representative VLA policies:OpenVLA, π_0, and π_{0.5}.Our method achieves state-of-the-art multi-taskfailure detectionperformance with a practical accuracy--timeliness trade-off underconformal prediction, and generalizes well to both seen and unseen tasks.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.30834
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.30834 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.30834 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.30834 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Figure AI's humanoid robot production hits a new high in May
Figure AI's humanoid robot production reached a new high in May, indicating increased manufacturing output.
Qualia has been selected for the GoogleDeepMind Robotics Program.
Qualia has been selected for Google DeepMind's Robotics Program to train embodied models for real-world manual tasks, aiming to advance foundation models and reasoning in robotics.
The level of agility robots are reaching is getting wild
A video demonstrates robots with impressive agility, far exceeding prior expectations for their movement capabilities.
Axol
Axol is a robot designed to automate physical work, offering a powerful solution for automation.
I asked how you all handle agent memory. Here's the pattern in the replies, and the one thing nobody's actually solved.
A community discussion on agent memory reveals that while various patches exist for what to write down (e.g., plain files, layered memory, post-mortems), the unsolved problem is what to keep—detecting failures is tractable, but deciding which lessons persist still needs human judgment.