long-range-consistency

#long-range-consistency

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

The paper introduces MilliVid, a method for improving long-range consistency in video generation by using a multi-scale autoencoder to compress frames into hierarchical tokens and then generating them with a coarse-to-fine diffusion model, outperforming baselines on Minecraft videos.

0 favorites 0 likes

long-range-consistency

MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation

Submit Feedback