One-Forcing: Towards Stable One-Step Autoregressive Video Generation
Summary
One-Forcing improves one-step video generation by augmenting the DMD objective with an auxiliary GAN loss, achieving state-of-the-art performance with reduced training costs.
View Cached Full Text
Cached at: 06/01/26, 07:21 PM
Paper page - One-Forcing: Towards Stable One-Step Autoregressive Video Generation
Source: https://huggingface.co/papers/2605.23458
Abstract
One-Forcing improves one-step video generation quality and efficiency by combining DMD objective with GAN loss, achieving state-of-the-art results with reduced training costs.
Recent advances have substantially improved real-time interactive video generation in the autoregressive regime. However, most existing few-stepautoregressive video generationmethods, often distilled from a corresponding many-step teacher, default to a 4-step sampling configuration, which still incurs considerable latency during deployment and suffers from severe quality degradation when the number of sampling steps is further reduced, particularly in the one-step setting.Trajectory-style consistency distillationmethods often produce videos with weak dynamics, whileDMD-based approaches, such asSelf-Forcing, tend to yield blurry frames. To address this challenge, we propose One-Forcing, a simple yet effective approach which augments the DMD objective with an auxiliaryGAN lossfor high-quality and efficientone-step video generation. Experiments on VBench show that One-Forcing achieves a total score of 83.76, establishing state-of-the-art performance among one-stepcausal video generationmethods and remaining competitive with strong many-step approaches. We further demonstrate that one-step framewise autoregressive generation can be achieved stably with merely one-third of the training cost of thechunkwise model, a setting that prior methods have failed to achieve successfully.
View arXiv pageView PDFProject pageGitHub30Add to collection
Get this paper in your agent:
hf papers read 2605\.23458
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.23458 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.23458 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.23458 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
AAD-1 introduces asymmetric adversarial distillation with phased training to achieve one-step autoregressive video generation, outperforming prior methods on VBench.
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
Causal Forcing++ presents a novel causal consistency distillation method for frame-wise autoregressive video generation, achieving state-of-the-art quality with reduced latency and training cost.
On-Policy Adversarial Flow Distillation for Autoregressive Video Generation
Proposes Adversarial Flow Distillation (AFD) for distilling heterogeneous black-box video generation models into autoregressive students, using on-policy feedback and forward-process flow-matching updates.
Streaming Video Generation with Streaming Force Control
StreamForce is a causal, unified video generation model that provides real-time, physically grounded responses to time-varying forces through a distillation pipeline and autoregressive architecture, achieving state-of-the-art performance in force adherence and motion realism.
Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models
This paper introduces Forcing-KV, a hybrid KV cache compression strategy for autoregressive video diffusion models that separates attention heads into static and dynamic categories, achieving up to 2.82x speedup at 1080P resolution while maintaining output quality.