WorldKV: Efficient World Memory with World Retrieval and Compression
Summary
WorldKV is a training-free framework that retrieves and compresses key-value cache chunks to maintain long-term consistency in video diffusion world generation, achieving higher throughput while matching full-memory fidelity.
View Cached Full Text
Cached at: 05/22/26, 02:24 AM
Paper page - WorldKV: Efficient World Memory with World Retrieval and Compression
Source: https://huggingface.co/papers/2605.22718
Abstract
WorldKV enables persistent world generation in video diffusion models by retrieving and compressing key-value cache chunks to maintain consistency while improving throughput.
Autoregressive video diffusion modelshave enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. FullKV-cache attentionpreserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length.Sliding window inferencerestores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components:World RetrievalandWorld Compression.World Retrievalstores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the nativeattention windowwithout re-encoding.World Compressionprunes redundant tokens within each chunk viakey-key similarityto an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/
View arXiv pageView PDFProject pageAdd to collection
Get this paper in your agent:
hf papers read 2605\.22718
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.22718 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.22718 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.22718 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@GitHub_Daily: For those in quantitative research, daily facing massive financial reports and cutting-edge papers, manually filtering valuable content is like finding a needle in a haystack. Recently discovered an open-source project called QuantMind, focused on intelligent knowledge extraction and retrieval for quantitative finance. It can automatically fetch papers, news, blogs, and turn unstructured documents into searchable...
QuantMind is an open-source framework for intelligent knowledge extraction and retrieval in quantitative finance. It can automatically fetch unstructured content like papers and news, build a queryable structured knowledge base, and support natural language retrieval.
Open sourcing InfiniteKV: a KV cache that files old tokens as 104-byte searchable records in RAM or on disk instead of deleting them. Mistral-7B answered from token 76,747, 2.3x past its trained window. Colab demo
InfiniteKV is an open-source KV cache technique that compresses old tokens into 104-byte searchable records stored in RAM or on disk, enabling models to handle million-token contexts beyond their trained window without discarding data. Verified working with Mistral-7B and SmolLM2.
Do agents need a "brain" separate from their knowledge base?
The author proposes a mental model where AI agents should maintain a separate memory layer (brain) that stores reusable understanding, distinct from their knowledge base (library), to avoid rediscovering the same information repeatedly.
Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory
Proposes a cognitively grounded multi-factor value function for agentic memory in LLM agents, learning interpretable weights to decide what to encode, forget, and retrieve under memory constraints. Improves gold-evidence retention significantly over similarity-only or recency-based baselines.
SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents
SkillCAT is a training-free framework for LLM agent skill self-evolution that addresses limitations of single-trace bias, unverified merging, and full corpus loading via three stages: Contrastive Causal Extraction, Assessment-Augmented Evolution, and Topology-Aware Task Execution, achieving up to 40.40% improvement on benchmarks.