NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

Hugging Face Daily Papers Papers

Summary

This paper presents the NTIRE 2026 Challenge on Video Saliency Prediction, introducing a novel dataset of 2,000 diverse videos with saliency maps collected via crowdsourced mouse tracking from over 5,000 assessors. Over 20 teams participated, with 7 passing the final phase, and all data is made publicly available.

This paper presents an overview of the NTIRE 2026 Challenge on Video Saliency Prediction. The goal of the challenge participants was to develop automatic saliency map prediction methods for the provided video sequences. The novel dataset of 2,000 diverse videos with an open license was prepared for this challenge. The fixations and corresponding saliency maps were collected using crowdsourced mouse tracking and contain viewing data from over 5,000 assessors. Evaluation was performed on a subset of 800 test videos using generally accepted quality metrics. The challenge attracted over 20 teams making submissions, and 7 teams passed the final phase with code review. All data used in this challenge is made publicly available - https://github.com/msu-video-group/NTIRE26_Saliency_Prediction.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/21/26, 07:21 AM

Paper page - NTIRE 2026 Challenge on Video Saliency Prediction: Methods and Results

Source: https://huggingface.co/papers/2604.14816 Authors:

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Abstract

ThispaperpresentsanoverviewoftheNTIRE2026ChallengeonVideoSaliencyPrediction.Thegoalofthechallengeparticipantswastodevelopautomaticsaliencymappredictionmethodsfortheprovidedvideosequences.Thenoveldatasetof2,000diversevideoswithanopenlicensewaspreparedforthischallenge.Thefixationsandcorrespondingsaliencymapswerecollectedusingcrowdsourcedmousetrackingandcontainviewingdatafromover5,000assessors.Evaluationwasperformedonasubsetof800testvideosusinggenerallyacceptedqualitymetrics.Thechallengeattractedover20teamsmakingsubmissions,and7teamspassedthefinalphasewithcodereview.Alldatausedinthischallengeismadepubliclyavailable-https://github.com/msu-video-group/NTIRE26_Saliency_Prediction.

View arXiv pageView PDFProject pageGitHub15Add to collection

Get this paper in your agent:

hf papers read 2604\.14816

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.14816 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.14816 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.14816 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

Hugging Face Daily Papers

Re2Pix is a hierarchical video prediction framework that improves future video generation by first predicting semantic representations using frozen vision foundation models, then conditioning a latent diffusion model on these predictions to generate photorealistic frames. The approach addresses train-test mismatches through nested dropout and mixed supervision strategies, achieving improved temporal semantic consistency and perceptual quality on autonomous driving benchmarks.

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

Hugging Face Daily Papers

VEFX-Bench introduces a large-scale human-annotated video editing dataset (5,049 examples) with multi-dimensional quality labels and a specialized reward model for standardized evaluation of video editing systems. The paper addresses the lack of comprehensive benchmarks in AI-assisted video creation by providing VEFX-Dataset, VEFX-Reward, and a 300-video-prompt benchmark that reveals gaps in current editing models.

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency

arXiv cs.CL

AtManRL is a method that uses differentiable attention manipulation and reinforcement learning to train LLMs to generate more faithful chain-of-thought reasoning by ensuring reasoning tokens causally influence final predictions. Experiments on GSM8K and MMLU with Llama-3.2-3B demonstrate the approach can identify influential reasoning tokens and improve reasoning transparency.