@_akhaliq: paper:

X AI KOLs Following Papers

Summary

This paper proposes Robust-TO, an agentic video understanding framework that integrates per-frame trustworthiness to address the Blind Trust Problem, achieving significant accuracy gains under realistic perturbations.

paper: https://t.co/eUoddauH25
Original Article
View Cached Full Text

Cached at: 06/26/26, 12:10 PM

paper: https://t.co/eUoddauH25


Paper page - Confidence-Aware Tool Orchestration for Robust Video Understanding

Source: https://huggingface.co/papers/2606.26904

Abstract

Robust-TO addresses the Blind Trust Problem in video reasoning by integrating per-frame trustworthiness into an agentic framework that improves accuracy under realistic perturbations through calibrated evidence weighting and reliability-aware reasoning.

Video reasoninglanguage models implicitly assume that every input frame is equally reliable. This leads to what we term theBlind Trust Problem: under realistic perturbations such as motion blur, glare, or occlusion, frontiervideo reasoningmodels can suffer 15-30%p accuracy drops on real-world embodied benchmarks, while remaining unaware that their visual evidence has been degraded. To address this challenge, we propose Robust-TO, anagentic video understandingframework that explicitly integrates per-frame trustworthiness into every stage of reasoning. Robust-TO organizes heterogeneous visual perception tools under a unifiedevidence interface. Each tool receives a sub-query derived from the original question and a set of trustworthy frames selected by thereliability-relevance score. It returns evidence in a shared format: a concrete prediction (e.g., a bounding box, motion trajectory, recognized text, or action label), temporal grounding, and acalibrated reliability score. During reasoning, these calibrated scores guide evidence weighting in athree-tier synthesis process(high/medium/low) and define aconfidence-cost GRPO rewardthat jointly optimizes correctness, evidence reliability, and efficiency. On twovideo reasoning benchmarksspanning eight tasks, Robust-TO achieves 56.4% average accuracy on clean inputs, surpassing the strongest open-source baseline by 10.6%p and outperforming Gemini-2.5-Pro (46.2%). Under five realistic corruption types, Robust-TO maintains 54.3% average accuracy, 5.8%p above the strongest open-source baseline, while exhibiting the smallest clean-to-corrupted accuracy drop among all compared methods.

View arXiv pageView PDFProject pageGitHub1Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.26904 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.26904 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.26904 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Confidence-Aware Tool Orchestration for Robust Video Understanding

Hugging Face Daily Papers

Robust-TO addresses the Blind Trust Problem in video reasoning by integrating per-frame trustworthiness into an agentic framework, improving accuracy under realistic perturbations through calibrated evidence weighting and reliability-aware reasoning.

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

arXiv cs.AI

This survey provides a comprehensive examination of trustworthy agentic AI, focusing on safety, robustness, privacy, and system security. It clarifies key concepts, identifies risks along the agent workflow, summarizes mitigation strategies, and consolidates evaluation metrics and benchmarks, aiming to serve as a practical reference for deploying agentic AI in high-stakes environments.