Speculative Decoding for Autoregressive Video Generation

Hugging Face Daily Papers 04/19/26, 12:00 AM Papers

Summary

SDVG adapts speculative decoding to autoregressive video diffusion, using an image-quality router to achieve up to 2.09× speed-up with 95.7% quality retention on MovieGenVideoBench.

Autoregressive video diffusion is emerging as a promising paradigm for streaming video synthesis, with step distillation serving as the primary means of accelerating inference. Whether speculative decoding, the dominant acceleration strategy for large language models, can be effectively adapted to autoregressive video generation remains an open question, because video blocks are continuous spatiotemporal tensors with no token-level distribution for exact rejection sampling. We introduce SDVG, which brings speculative decoding to block-based autoregressive video diffusion by replacing token verification with an image-quality router. A 1.3B drafter proposes candidate blocks via four denoising steps; each block is VAE-decoded and scored by ImageReward using worst-frame aggregation--taking the minimum per-frame reward to catch single-frame artifacts that averaging would mask. Blocks scoring above a fixed threshold tau are accepted into the 14B target's KV cache; the rest are regenerated by the target. Two additional design choices prove critical: the first block is always force-rejected to anchor scene composition, and tau serves as a single knob that traces a smooth quality-speed Pareto frontier. On 1003 MovieGenVideoBench prompts (832x480), SDVG retains 98.1% of target-only VisionReward quality (0.0773 vs. 0.0788) at a 1.59x speedup with tau=-0.7, and reaches 2.09x at 95.7% quality retention--while consistently outperforming draft-only generation by over +17%. The framework is training-free, requires no architectural changes, and can be seamlessly integrated into existing autoregressive video generation pipelines.

Original Article

View Cached Full Text

Cached at: 04/22/26, 06:17 AM

Paper page - Speculative Decoding for Autoregressive Video Generation

Source: https://huggingface.co/papers/2604.17397

Abstract

Speculative decoding is adapted to autoregressive video diffusion through a quality-based routing mechanism that maintains high visual quality while achieving significant speedup.

Autoregressive video diffusionis emerging as a promising paradigm for streaming video synthesis, withstep distillationserving as the primary means of accelerating inference. Whetherspeculative decoding, the dominant acceleration strategy for large language models, can be effectively adapted to autoregressive video generation remains an open question, because video blocks are continuous spatiotemporal tensors with no token-level distribution for exact rejection sampling. We introduce SDVG, which bringsspeculative decodingto block-basedautoregressive video diffusionby replacing token verification with an image-quality router. A 1.3B drafter proposes candidate blocks via fourdenoising steps; each block is VAE-decoded and scored byImageRewardusingworst-frame aggregation--taking the minimum per-frame reward to catch single-frame artifacts that averaging would mask. Blocks scoring above a fixed threshold tau are accepted into the 14B target’sKV cache; the rest are regenerated by the target. Two additional design choices prove critical: the first block is always force-rejected to anchor scene composition, and tau serves as a single knob that traces a smooth quality-speedPareto frontier. On 1003 MovieGenVideoBench prompts (832x480), SDVG retains 98.1% of target-onlyVisionRewardquality (0.0773 vs. 0.0788) at a 1.59x speedup with tau=-0.7, and reaches 2.09x at 95.7% quality retention--while consistently outperforming draft-only generation by over +17%. The framework is training-free, requires no architectural changes, and can be seamlessly integrated into existing autoregressive video generation pipelines.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2604\.17397

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.17397 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.17397 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.17397 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Speculative Decoding for Autoregressive Video Generation

Paper page - Speculative Decoding for Autoregressive Video Generation

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Long Video Generation (4 minute read)

What is Speculative Decoding? (trending on paperswithco.de) [R]

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Speculative Refinement: A Hybrid Autoregressive Diffusion Decoding Strategy and Its Behavior Across Benchmarks

Submit Feedback

Similar Articles

Long Video Generation (4 minute read)

What is Speculative Decoding? (trending on paperswithco.de) [R]

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

Speculative Refinement: A Hybrid Autoregressive Diffusion Decoding Strategy and Its Behavior Across Benchmarks