Q-ARVD: Quantizing Autoregressive Video Diffusion Models
Summary
Q-ARVD is a novel quantization framework to reduce inference costs of autoregressive video diffusion models by addressing frame-wise sensitivity imbalance and weight outlier patterns.
View Cached Full Text
Cached at: 05/22/26, 06:23 AM
Paper page - Q-ARVD: Quantizing Autoregressive Video Diffusion Models
Source: https://huggingface.co/papers/2605.21072
Abstract
Autoregressive video diffusion models face high inference costs that limit practical deployment, prompting the development of Q-ARVD, a novel quantization framework addressing frame-wise sensitivity imbalance and weight outlier patterns specific to these models.
Autoregressive video diffusion models(ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making modelquantizationa natural direction for improving efficiency. However,quantizationfor ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existingquantizationschemes developed for standarddiffusion transformersto ARVDs leads to suboptimal performance, revealingquantizationbehaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalancedframe-wise quantization sensitivity.Error accumulationduring autoregressive generation can induce severely skewedquantizationsensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights.Weight distributionsexhibit pronouncedoutlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVDquantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates afinal-quality aware frame-weightingmechanism into thequantizationobjective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-awareadaptive dual-scale quantization, which automatically detects the presence and quantity ofoutlier channelsfor an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.
View arXiv pageView PDFGitHub9Add to collection
Get this paper in your agent:
hf papers read 2605\.21072
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.21072 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.21072 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.21072 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
I'm still surprised on how good the kv quantization has become
The author expresses surprise at how effective key-value cache quantization (q4_0) remains even with large context windows, citing accurate retrieval from a 100k context.
moar QAT stuff and hairy ticks
The author releases improved GGUF quantized versions of Gemma 4 models (12B and 31B) using a more accurate quantization-aware training process that achieves lower KLD and higher same-top percentage than stock quantizations.
Where Black-box Drug-Target Interaction Prediction Models Look: Cross-Method Explainability
A study presenting a cross-method explainability audit of the BridgeDPI drug-target interaction model, combining gradient-based attributions and occlusion to reveal modality dominance and artifacts, providing testable hypotheses for drug discovery.
Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning
Zeta proposes a dual whitening optimizer that applies coordinate whitening before spectral whitening to resolve scale heterogeneity in momentum matrices, reducing orthogonalization error and improving convergence and generalization in large-scale neural network training.
Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems
Proposes a spectral learning method for stochastic nonlinear dynamical systems using deep feature spaces and an operator-based latent state-space model, demonstrating stable performance in forecasting and filtering tasks.