NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results
Summary
This paper presents the NTIRE 2026 Challenge on Video Saliency Prediction, introducing a novel dataset of 2,000 diverse videos with saliency maps collected via crowdsourced mouse tracking from over 5,000 assessors. Over 20 teams participated, with 7 passing the final phase, and all data is made publicly available.
View Cached Full Text
Cached at: 04/21/26, 07:21 AM
Paper page - NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results
Source: https://huggingface.co/papers/2604.14816 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
ThispaperpresentsanoverviewoftheNTIRE2026ChallengeonVideoSaliencyPrediction.Thegoalofthechallengeparticipantswastodevelopautomaticsaliencymappredictionmethodsfortheprovidedvideosequences.Thenoveldatasetof2,000diversevideoswithanopenlicensewaspreparedforthischallenge.Thefixationsandcorrespondingsaliencymapswerecollectedusingcrowdsourcedmousetrackingandcontainviewingdatafromover5,000assessors.Evaluationwasperformedonasubsetof800testvideosusinggenerallyacceptedqualitymetrics.Thechallengeattractedover20teamsmakingsubmissions,and7teamspassedthefinalphasewithcodereview.Alldatausedinthischallengeismadepubliclyavailable-https://github.com/msu-video-group/NTIRE26_Saliency_Prediction.
View arXiv pageView PDFProject pageGitHub15Add to collection
Get this paper in your agent:
hf papers read 2604\.14816
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.14816 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.14816 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.14816 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Unification of Closed-Open Industrial Detection Scenarios: New Large-Scale Benchmarks,Challenges and Baselines
Introduces MMIOC-1M, a large-scale multi-modal benchmark for industrial defect detection, and proposes RTVPNet, a refined text-visual prompt network achieving state-of-the-art performance.
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction
Introduces Future-L1, an interleaved latent visual reasoning framework that improves video event prediction by maintaining visual semantics in latent space. Achieves state-of-the-art results on FutureBench and TwiFF-Bench benchmarks.
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
This paper introduces Sparkle, a new dataset and benchmark for instruction-guided video background replacement, addressing the lack of high-quality training data in this domain. It proposes a scalable pipeline with decoupled guidance to generate realistic foreground-background interactions.
Relit-LiVE: Relight Video by Jointly Learning Environment Video
This paper introduces Relit-LiVE, a novel video relighting framework that produces physically consistent results without requiring camera pose information by using raw reference images and joint environment video prediction.
A Very Big Video Reasoning Suite
This paper introduces the Very Big Video Reasoning (VBVR) dataset and benchmark, a large-scale resource with over one million video clips across 200 reasoning tasks, enabling systematic study of spatiotemporal reasoning and showing early signs of emergent generalization.