streaming-video

#streaming-video

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Papers with Code Trending ↗ · 2026-06-02 Cached

OVO-S-Bench introduces a comprehensive human-annotated benchmark of 1,680 questions across 348 videos to evaluate streaming spatial intelligence in multimodal LLMs, revealing that even the best model (Gemini-3.1-Pro) trails human experts by 27 points. The benchmark exposes key limitations including allocentric mapping as a major bottleneck and chain-of-thought reasoning amplifying spatial errors.

0 favorites 0 likes

#streaming-video

Task-Focused Memorization for Multimodal Agents

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

Introduces TaskMem, a reinforcement-learning-based framework for dynamic memorization in multimodal agents, achieving accuracy improvements of 6.3%, 7.0%, and 5.3% on streaming video benchmarks.

0 favorites 0 likes

#streaming-video

AdaState: Self-Evolving Anchors for Streaming Video Generation

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper introduces AdaState, a method that replaces the static first-frame anchor in autoregressive video diffusion models with an adaptive state that evolves with the generated content, enabling richer motion and natural scene progression in streaming video generation.

0 favorites 0 likes

#streaming-video

OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

OmniPro is the first benchmark for evaluating proactive streaming video understanding in omni-modal large language models, featuring 2,700 samples covering diverse tasks and dual-mode evaluation protocols.

0 favorites 0 likes

#streaming-video

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

Hugging Face Daily Papers ↗ · 2026-05-05 Cached

Stream-R1 introduces a reliability-perplexity aware reward distillation framework for streaming video generation that adaptively weights supervision to improve visual and motion quality without additional computational overhead.

0 favorites 0 likes

streaming-video

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Task-Focused Memorization for Multimodal Agents

AdaState: Self-Evolving Anchors for Streaming Video Generation

OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

Submit Feedback