WorldKV: Efficient World Memory with World Retrieval and Compression

Hugging Face Daily Papers 05/21/26, 12:00 AM Papers

world-generation video-diffusion kv-cache memory-compression retrieval training-free persistent-world

Summary

WorldKV is a training-free framework that retrieves and compresses key-value cache chunks to maintain long-term consistency in video diffusion world generation, achieving higher throughput while matching full-memory fidelity.

Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference restores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components: World Retrieval and World Compression. World Retrieval stores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the native attention window without re-encoding. World Compression prunes redundant tokens within each chunk via key-key similarity to an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/

Original Article

View Cached Full Text

Cached at: 05/22/26, 02:24 AM

Paper page - WorldKV: Efficient World Memory with World Retrieval and Compression

Source: https://huggingface.co/papers/2605.22718

Abstract

WorldKV enables persistent world generation in video diffusion models by retrieving and compressing key-value cache chunks to maintain consistency while improving throughput.

Autoregressive video diffusion modelshave enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. FullKV-cache attentionpreserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length.Sliding window inferencerestores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components:World RetrievalandWorld Compression.World Retrievalstores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the nativeattention windowwithout re-encoding.World Compressionprunes redundant tokens within each chunk viakey-key similarityto an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.22718

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.22718 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.22718 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.22718 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

WorldKV: Efficient World Memory with World Retrieval and Compression

Paper page - WorldKV: Efficient World Memory with World Retrieval and Compression

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Open sourcing InfiniteKV: a KV cache that files old tokens as 104-byte searchable records in RAM or on disk instead of deleting them. Mistral-7B answered from token 76,747, 2.3x past its trained window. Colab demo

Do agents need a "brain" separate from their knowledge base?

Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory

SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents

Submit Feedback

Similar Articles

@GitHub_Daily: For those in quantitative research, daily facing massive financial reports and cutting-edge papers, manually filtering valuable content is like finding a needle in a haystack. Recently discovered an open-source project called QuantMind, focused on intelligent knowledge extraction and retrieval for quantitative finance. It can automatically fetch papers, news, blogs, and turn unstructured documents into searchable...

Open sourcing InfiniteKV: a KV cache that files old tokens as 104-byte searchable records in RAM or on disk instead of deleting them. Mistral-7B answered from token 76,747, 2.3x past its trained window. Colab demo

Do agents need a "brain" separate from their knowledge base?

Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory

SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents