FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching
Summary
A novel inference-time method for long video generation using overlapping sliding windows with Tweedie matching and stochastic early-phase sampling to improve temporal consistency and visual quality without additional training.
View Cached Full Text
Cached at: 05/22/26, 06:35 AM
Paper page - FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching
Source: https://huggingface.co/papers/2605.20910
Abstract
A novel inference-time method for long video generation using overlapping sliding windows with Tweedie matching and stochastic early-phase sampling to improve temporal consistency and visual quality.
Extending the generation horizon ofvideo diffusion modelsto long sequences remains a long-standing and important challenge. Existing training-free approaches fall into two categories: extensions ofbidirectional models, which are tightly coupled to specific architectures and suffer from quality degradation over long horizons, andautoregressive models, which accumulate drift errors due toexposure biasand tend to produce repetitive motion patterns. To address these issues, we propose a novel but simple inference-time approach for longvideo generationthat is architecture-agnostic and requires no additional training. Our method generates long videos via overlappingsliding windows, where predicted clean samples from adjacent windows are blended viaTweedie matchingto enforce both manifold constraint andtemporal consistencyacross overlap regions.Stochastic early-phase samplingthen synchronizes per-window trajectories by injecting fresh noise after eachTweedie matchingcorrection in the high-noise phase, before transitioning todeterministic ODE samplingto preserve fine-grained visual fidelity. Applied to variousvideo generationmodels, our method generates videos several times longer than the native window length while outperforming both training-free and autoregressive baselines intemporal consistencyand visual quality, and further extends toaudio-video joint generationandtext-to-3DGSwithout any fine-tuning.
View arXiv pageView PDFProject pageGitHub2Add to collection
Get this paper in your agent:
hf papers read 2605\.20910
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.20910 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.20910 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.20910 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection
Proposes ARIADNE, a training-free, adapter-agnostic routing framework that selects the optimal PEFT adapter at inference time by measuring input proximity to adapter-specific centroids in embedding space, recovering 97.44% of upper-bound performance on 23 tasks.
Memento: Reconstruct to Remember for Consistent Long Video Generation
Memento is a subject-reconstruction-guided framework that improves long-form video generation by preserving recurring subjects through memory-based reconstruction and dual-query mechanisms, achieving state-of-the-art performance in long-term subject consistency and cross-shot coherence.
From Consumption to Reflection: Designing Human-AI Relations for Stable Reasoning
This paper introduces Relational Reflective Intelligence (RRI), an inference-time governance layer that uses auditable reasoning loops to stabilize human-AI reasoning, addressing cognitive vulnerabilities shared by humans and LLMs.
Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs
The paper introduces Probe-Conditioned Head Intervention (PCHI), an inference-time method for LLMs that selectively reduces overconfidence on wrong answers without significantly reducing confidence on correct ones, by conditionally rescaling attention head outputs when the model is likely wrong but confident.
Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents
Evoflux uses evolutionary search at inference time to repair failed tool workflows for compact language models, boosting execution feasibility significantly over fine-tuning methods.