NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results
Summary
This paper presents the NTIRE 2026 Challenge on Video Saliency Prediction, introducing a novel dataset of 2,000 diverse videos with saliency maps collected via crowdsourced mouse tracking from over 5,000 assessors. Over 20 teams participated, with 7 passing the final phase, and all data is made publicly available.
View Cached Full Text
Cached at: 04/21/26, 07:21 AM
Paper page - NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results
Source: https://huggingface.co/papers/2604.14816 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
ThispaperpresentsanoverviewoftheNTIRE2026ChallengeonVideoSaliencyPrediction.Thegoalofthechallengeparticipantswastodevelopautomaticsaliencymappredictionmethodsfortheprovidedvideosequences.Thenoveldatasetof2,000diversevideoswithanopenlicensewaspreparedforthischallenge.Thefixationsandcorrespondingsaliencymapswerecollectedusingcrowdsourcedmousetrackingandcontainviewingdatafromover5,000assessors.Evaluationwasperformedonasubsetof800testvideosusinggenerallyacceptedqualitymetrics.Thechallengeattractedover20teamsmakingsubmissions,and7teamspassedthefinalphasewithcodereview.Alldatausedinthischallengeismadepubliclyavailable-https://github.com/msu-video-group/NTIRE26_Saliency_Prediction.
View arXiv pageView PDFProject pageGitHub15Add to collection
Get this paper in your agent:
hf papers read 2604\.14816
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.14816 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.14816 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.14816 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
This paper introduces Sparkle, a new dataset and benchmark for instruction-guided video background replacement, addressing the lack of high-quality training data in this domain. It proposes a scalable pipeline with decoupled guidance to generate realistic foreground-background interactions.
Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction
Re2Pix is a hierarchical video prediction framework that improves future video generation by first predicting semantic representations using frozen vision foundation models, then conditioning a latent diffusion model on these predictions to generate photorealistic frames. The approach addresses train-test mismatches through nested dropout and mixed supervision strategies, achieving improved temporal semantic consistency and perceptual quality on autonomous driving benchmarks.
VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects
VEFX-Bench introduces a large-scale human-annotated video editing dataset (5,049 examples) with multi-dimensional quality labels and a specialized reward model for standardized evaluation of video editing systems. The paper addresses the lack of comprehensive benchmarks in AI-assisted video creation by providing VEFX-Dataset, VEFX-Reward, and a 300-video-prompt benchmark that reveals gaps in current editing models.
AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
AtManRL is a method that uses differentiable attention manipulation and reinforcement learning to train LLMs to generate more faithful chain-of-thought reasoning by ensuring reasoning tokens causally influence final predictions. Experiments on GSM8K and MMLU with Llama-3.2-3B demonstrate the approach can identify influential reasoning tokens and improve reasoning transparency.
Evaluating Explainability in Safety-Critical ATR Systems: Limitations of Post-Hoc Methods and Paths Toward Robust XAI
This paper evaluates explainability methods in safety-critical Automatic Target Recognition (ATR) systems, highlighting the limitations of post-hoc techniques like saliency and attention maps. It proposes a taxonomy and assessment framework to address issues such as spurious explanations and instability, advocating for more robust, causally grounded XAI approaches.