Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

Hugging Face Daily Papers 05/29/26, 12:00 AM Papers

Summary

Light Interaction introduces a training-free inference acceleration framework for interactive video world models, using adaptive context management, denoising cache acceleration, and 3D block sparse attention to achieve up to 2.59x speedup while maintaining competitive visual quality.

Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework for interactive video world models. Our key insight is that interaction naturally enables trajectory-dependent adaptive computation: retrieved spatial memory can be discarded during novel exploration, temporal context can be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management, denoising cache acceleration, and hardware-software co-designed 3D block sparse attention with fused Triton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.

Original Article

View Cached Full Text

Cached at: 06/01/26, 03:17 AM

Paper page - Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

Source: https://huggingface.co/papers/2605.31158

Abstract

Light Interaction accelerates interactive video world models through adaptive computation strategies and optimized attention mechanisms without requiring model retraining.

Interactive videoworld models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework forinteractive videoworld models. Our key insight is that interaction naturally enablestrajectory-dependent adaptive computation: retrievedspatial memorycan be discarded during novel exploration,temporal contextcan be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management,denoising cache acceleration, and hardware-software co-designed3D block sparse attentionwith fusedTriton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.31158

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.31158 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.31158 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.31158 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

Paper page - Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Video outpainting is getting really good

Inside Google DeepMind: Reasoning, Omni, and Shipping Frontier AI

Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents

@berryxia: Holy shit! Huang is amazing! Now I can directly bookmark the HTML to easily create videos. I was also tinkering with hyperframe and remotion for videos today. Now I can use it directly, it's like a pillow delivered just when needed! Link: https://github.com/nexu-io/ope…

Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval

Submit Feedback

Similar Articles

Video outpainting is getting really good

Inside Google DeepMind: Reasoning, Omni, and Shipping Frontier AI

Autoregressive Diffusion World Models for Off-Policy Evaluation of LLM Agents

@berryxia: Holy shit! Huang is amazing! Now I can directly bookmark the HTML to easily create videos. I was also tinkering with hyperframe and remotion for videos today. Now I can use it directly, it's like a pillow delivered just when needed! Link: https://github.com/nexu-io/ope…

Training-Free Lexical-Dense Fusion for Conversational-Memory Retrieval