Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Hugging Face Daily Papers 05/18/26, 12:00 AM Papers

long-video-generation temporal-consistency diffusion-models training-inference-gap autoregressive-frameworks infinite-frame-generation

Summary

MIGA is a train-free method for generating consistent long videos by reducing the training-inference gap and enhancing temporal consistency through dual consistency mechanisms.

Without incurring significant computational overhead, train-free long video generation aims to enable foundation video generation models to produce longer videos. Frame-level autoregressive frameworks, e.g., FIFO-diffusion, offer the advantage of generating infinitely long videos with constant memory consumption. However, the mismatch between training and inference, coupled with the challenge of maintaining long-term consistency, limits the effective utilization of foundation models. To mitigate these concerns, we propose MIGA, a novel infinite-frame long video generation method. Firstly, we propose an effective two-stage alignment mechanism that mitigates the training-inference gap by reducing the excessive noise span fed to the model. We then introduce an innovative dual consistency enhancement mechanism, where the self-reflection approach corrects early high-noise frames and the long-range frame guidance approach leverages later low-noise frames with broad coverage to steer generation, jointly improving temporal consistency. Extensive experiments on VBench and NarrLV demonstrate the state-of-the-art performance of MIGA. Our project page is available at https://xiaokunfeng.github.io/miga_homepage/.

Original Article

View Cached Full Text

Cached at: 05/21/26, 10:10 AM

Paper page - Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Source: https://huggingface.co/papers/2605.18233

Abstract

MIGA addresses long video generation challenges by reducing training-inference gaps and enhancing temporal consistency through dual consistency mechanisms.

Without incurring significant computational overhead, train-free long video generation aims to enable foundation video generation models to produce longer videos.Frame-level autoregressive frameworks, e.g.,FIFO-diffusion, offer the advantage of generating infinitely long videos with constant memory consumption. However, the mismatch between training and inference, coupled with the challenge of maintaining long-term consistency, limits the effective utilization of foundation models. To mitigate these concerns, we propose MIGA, a novel infinite-frame long video generation method. Firstly, we propose an effective two-stage alignment mechanism that mitigates thetraining-inference gapby reducing the excessivenoise spanfed to the model. We then introduce an innovative dual consistency enhancement mechanism, where theself-reflection approachcorrects early high-noise frames and thelong-range frame guidanceapproach leverages later low-noise frames with broad coverage to steer generation, jointly improvingtemporal consistency. Extensive experiments on VBench and NarrLV demonstrate the state-of-the-art performance of MIGA. Our project page is available at https://xiaokunfeng.github.io/miga_homepage/.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.18233

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.18233 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.18233 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.18233 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Paper page - Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Memento: Reconstruct to Remember for Consistent Long Video Generation

Long Video Generation (4 minute read)

One-Forcing: Towards Stable One-Step Autoregressive Video Generation

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Submit Feedback

Similar Articles

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Memento: Reconstruct to Remember for Consistent Long Video Generation

Long Video Generation (4 minute read)

One-Forcing: Towards Stable One-Step Autoregressive Video Generation

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation