Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Hugging Face Daily Papers Papers

Summary

Echo-Infinity introduces a learnable evolving memory mechanism for autoregressive video generation, enabling real-time infinite video generation with constant memory cost and state-of-the-art performance.

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnable Memory Query, which are updated by attention and a gating mechanism when past frames are evicted from the local window. The queries are optimized end-to-end with the video diffusion transformers (DiTs), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce Unified Relative RoPE Recipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to the DiTs' pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finite RoPE constraint and closing the train-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.
Original Article
View Cached Full Text

Cached at: 06/04/26, 03:41 AM

Paper page - Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Source: https://huggingface.co/papers/2606.04527 Published on Jun 3

#3 Paper of the day Authors:

,

,

,

,

,

,

,

,

,

,

Abstract

Echo Infinity enables real-time infinite video generation using learnable evolving memory and unified relative RoPE to overcome limitations in existing autoregressive methods.

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnableevolving memoryto dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnableMemory Query, which are updated by attention and agating mechanismwhen past frames are evicted from the local window. The queries are optimized end-to-end with thevideo diffusion transformers(DiTs), forming anevolving memorythat supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce UnifiedRelative RoPERecipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to theDiTs’ pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finiteRoPE constraintand closing thetrain-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.

View arXiv pageView PDFProject pageGitHub17Add to collection

Models citing this paper1

#### Echo-Team/Echo-Infinity Updatedabout 3 hours ago

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.04527 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.04527 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation

Hugging Face Daily Papers

Echo-Forcing introduces a scene memory framework for interactive long video generation, using hierarchical temporal memory, scene recall frames, and difference-aware memory decay to handle prompt switching and long-term recall. The method is training-free and achieves strong performance on VBench-Long.

Long Video Generation (4 minute read)

TLDR AI

The article introduces A²RD, a novel architecture for generating consistent long videos using agentic autoregressive diffusion. It proposes a Retrieve–Synthesize–Refine–Update cycle and a new benchmark, LVBench-C, to address semantic drift in long-horizon video synthesis.

Scaling Self-Evolving Agents via Parametric Memory

arXiv cs.AI

Researchers from Alibaba/Qwen and Peking University introduce TMEM, a self-evolving parametric memory framework that uses online LoRA weight updates to let LLM agents genuinely learn from experience within a single episode, rather than relying solely on prompt-space memory. TMEM outperforms summary-based and retrieval-based baselines across multiple benchmarks including LoCoMo, LongMemEval-S, and CL-Bench.