AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
Summary
AnyFlow introduces a novel any-step video diffusion distillation framework that optimizes full ODE sampling trajectories through flow-map transition learning and backward simulation, achieving performance that matches or surpasses consistency-based counterparts while scaling with sampling step budgets.
View Cached Full Text
Cached at: 05/14/26, 04:17 AM
Paper page - AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
Source: https://huggingface.co/papers/2605.13724
Abstract
AnyFlow introduces a novel any-step video diffusion distillation framework that improves upon consistency distillation by optimizing full ODE sampling trajectories through flow-map transition learning and backward simulation techniques.
Few-stepvideo generationhas been significantly advanced byconsistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated at test time, limiting their effectiveness for any-step video diffusion. This limitation arises becauseconsistency distillationreplaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, weakening the desirable test-time scaling behavior ofODE sampling. To address this limitation, we introduce AnyFlow, the first any-step video diffusion distillation framework based onflow maps. Instead of distilling a model for only a few fixed sampling steps, AnyFlow optimizes the fullODE samplingtrajectory. To this end, we shift the distillation target from endpoint consistency mapping (z_{t}rightarrow z_{0}) to flow-map transition learning (z_{t}rightarrow z_{r}) over arbitrary time intervals. We further propose Flow Map Backward Simulation, which decomposes a fullEuler rolloutinto shortcut flow-map transitions, enabling efficienton-policy distillationthat reduces test-time errors (i.e.,discretization errorin few-step sampling andexposure biasincausal generation). Extensive experiments across both bidirectional and causal architectures, at scales ranging from 1.3B to 14B parameters, demonstrate that AnyFlow achieves performance matches or surpasses consistency-based counterparts in the few-step regime, while scaling with sampling step budgets.
View arXiv pageView PDFProject pageGitHubAdd to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.13724 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.13724 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.13724 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Flow-OPD: On-Policy Distillation for Flow Matching Models
Flow-OPD is a research paper introducing a two-stage on-policy distillation framework for Flow Matching text-to-image models, significantly improving generation quality and alignment metrics using Stable Diffusion 3.5 Medium.
On-Policy Adversarial Flow Distillation for Autoregressive Video Generation
Proposes Adversarial Flow Distillation (AFD) for distilling heterogeneous black-box video generation models into autoregressive students, using on-policy feedback and forward-process flow-matching updates.
@HuggingPapers: NVIDIA just released AnyFlow on Hugging Face The first any-step video diffusion model that generates high-quality text-…
NVIDIA released AnyFlow, the first any-step video diffusion model for text-to-video generation, allowing smooth quality scaling across inference budgets (4 to 50 steps).
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
This paper introduces D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy self-distillation during supervised fine-tuning. It allows models to learn new concepts or styles without compromising their efficient few-step inference capabilities.
FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation
FlowLM introduces a flow matching language model derived from pre-trained diffusion models via efficient fine-tuning, enabling high-quality few-step text generation that rivals 2,000-step diffusion sampling with far fewer training epochs.