Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Hugging Face Daily Papers 05/20/26, 12:00 AM Papers

Summary

Q-ARVD is a novel quantization framework to reduce inference costs of autoregressive video diffusion models by addressing frame-wise sensitivity imbalance and weight outlier patterns.

Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.

Original Article

View Cached Full Text

Cached at: 05/22/26, 06:23 AM

Paper page - Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Source: https://huggingface.co/papers/2605.21072

Abstract

Autoregressive video diffusion models face high inference costs that limit practical deployment, prompting the development of Q-ARVD, a novel quantization framework addressing frame-wise sensitivity imbalance and weight outlier patterns specific to these models.

Autoregressive video diffusion models(ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making modelquantizationa natural direction for improving efficiency. However,quantizationfor ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existingquantizationschemes developed for standarddiffusion transformersto ARVDs leads to suboptimal performance, revealingquantizationbehaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalancedframe-wise quantization sensitivity.Error accumulationduring autoregressive generation can induce severely skewedquantizationsensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights.Weight distributionsexhibit pronouncedoutlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVDquantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates afinal-quality aware frame-weightingmechanism into thequantizationobjective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-awareadaptive dual-scale quantization, which automatically detects the presence and quantity ofoutlier channelsfor an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.

View arXiv page View PDF GitHub9 Add to collection

Get this paper in your agent:

hf papers read 2605\.21072

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.21072 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.21072 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.21072 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Paper page - Q-ARVD: Quantizing Autoregressive Video Diffusion Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

I'm still surprised on how good the kv quantization has become

moar QAT stuff and hairy ticks

Where Black-box Drug-Target Interaction Prediction Models Look: Cross-Method Explainability

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

Submit Feedback

Similar Articles

I'm still surprised on how good the kv quantization has become

moar QAT stuff and hairy ticks

Where Black-box Drug-Target Interaction Prediction Models Look: Cross-Method Explainability

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems