Tag
Memento is a subject-reconstruction-guided framework that improves long-form video generation by preserving recurring subjects through memory-based reconstruction and dual-query mechanisms, achieving state-of-the-art performance in long-term subject consistency and cross-shot coherence.
A novel inference-time method for long video generation using overlapping sliding windows with Tweedie matching and stochastic early-phase sampling to improve temporal consistency and visual quality without additional training.
MIGA is a train-free method for generating consistent long videos by reducing the training-inference gap and enhancing temporal consistency through dual consistency mechanisms.
LongLive-2.0 introduces an NVFP4-based parallel infrastructure for long video generation, achieving up to 2.15x training speedup and 1.84x inference speedup with a 5B model reaching 45.7 FPS.