DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
Summary
DecMem introduces a decoupled memory architecture with Sparse Global Memory and Anchored Local Memory to achieve consistent minute-long video generation, outperforming state-of-the-art methods.
View Cached Full Text
Cached at: 06/01/26, 11:20 AM
Paper page - DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
Source: https://huggingface.co/papers/2605.31336
Abstract
A novel decoupled memory architecture called DecMem is introduced for consistent long-horizon video generation, addressing computational inefficiency and attention dispersion issues in learnable memory systems.
Recent advances invideo generative modelshave promoted rapid progress in controllableworld models. However, maintaining fine-grainedspatio-temporal consistencyunderlong-horizon reasoningremains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose a fine-grained, learnable, and scalable memory for consistent world generation. We first identify two fundamental limitations of naïvelearnable memoryarchitectures in long-horizonextrapolation, namely computational inefficiency andattention dispersion. Through a systematic analysis ofattention dispersion, we propose DecMem, a decoupled memory architecture that employsSparse Global Memoryfor efficient fine-grained access to global history andAnchored Local Memoryfor stable and high-qualityextrapolation. Extensive experiments demonstrate that DecMem significantly outperforms current state-of-the-art methods. By ensuring precise and efficient long-term memory and achieving superiorextrapolationcapabilities, DecMem enables minute-level controllable longvideo generationwith high fidelity and consistency.
View arXiv pageView PDFProject pageGitHub3Add to collection
Get this paper in your agent:
hf papers read 2605\.31336
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### KlingTeam/DecMem Video-to-Video• Updatedabout 4 hours ago • 2
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.31336 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.31336 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
DimMem: Dimensional Structuring for Efficient Long-Term Agent Memory
DimMem introduces a dimensional memory framework for LLM agents that represents memories as atomic, typed units with explicit fields, achieving state-of-the-art accuracy on LoCoMo-10 and LongMemEval-S while reducing token costs by 24%.
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Mem0 introduces a scalable memory-centric architecture using graph-based representations to improve long-term conversational coherence in LLMs, significantly reducing latency and token costs while outperforming existing memory systems.
SimpleMem: Efficient Lifelong Memory for LLM Agents
Introduces SimpleMem, an efficient memory framework for LLM agents that uses semantic lossless compression to improve accuracy and reduce token consumption, achieving 26.4% F1 improvement and up to 30x reduction in inference-time token usage.
MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing
MemForest proposes a memory framework for long-context LLM agents that improves scalability and reduces latency through parallel chunk extraction and hierarchical temporal indexing, achieving 6x higher throughput on benchmarks.
Long Video Generation (4 minute read)
The article introduces A²RD, a novel architecture for generating consistent long videos using agentic autoregressive diffusion. It proposes a Retrieve–Synthesize–Refine–Update cycle and a new benchmark, LVBench-C, to address semantic drift in long-horizon video synthesis.