WorldKV: Efficient World Memory with World Retrieval and Compression

Hugging Face Daily Papers Papers

Summary

WorldKV is a training-free framework that retrieves and compresses key-value cache chunks to maintain long-term consistency in video diffusion world generation, achieving higher throughput while matching full-memory fidelity.

Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference restores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components: World Retrieval and World Compression. World Retrieval stores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the native attention window without re-encoding. World Compression prunes redundant tokens within each chunk via key-key similarity to an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/
Original Article
View Cached Full Text

Cached at: 05/22/26, 02:24 AM

Paper page - WorldKV: Efficient World Memory with World Retrieval and Compression

Source: https://huggingface.co/papers/2605.22718

Abstract

WorldKV enables persistent world generation in video diffusion models by retrieving and compressing key-value cache chunks to maintain consistency while improving throughput.

Autoregressive video diffusion modelshave enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. FullKV-cache attentionpreserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length.Sliding window inferencerestores throughput but discards long-term consistency. We propose WorldKV, a training-free framework with two components:World RetrievalandWorld Compression.World Retrievalstores evicted KV-cache chunks in GPU/CPU memory and selectively retrieves scene-relevant chunks via camera/ action correspondence, inserting them back into the nativeattention windowwithout re-encoding.World Compressionprunes redundant tokens within each chunk viakey-key similarityto an anchor frame, halving per-chunk storage to fit 2x more history under a fixed budget. On Matrix-Game-2.0 and LingBot- World-Fast, WorldKV matches or exceeds full-KV memory fidelity at roughly 2x the throughput, and is competitive with memory-trained baselines without any fine-tuning. Project Page: https://cvlab-kaist.github.io/WorldKV/

View arXiv pageView PDFProject pageAdd to collection

Get this paper in your agent:

hf papers read 2605\.22718

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.22718 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.22718 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.22718 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

@GitHub_Daily: For those in quantitative research, daily facing massive financial reports and cutting-edge papers, manually filtering valuable content is like finding a needle in a haystack. Recently discovered an open-source project called QuantMind, focused on intelligent knowledge extraction and retrieval for quantitative finance. It can automatically fetch papers, news, blogs, and turn unstructured documents into searchable...

X AI KOLs Timeline

QuantMind is an open-source framework for intelligent knowledge extraction and retrieval in quantitative finance. It can automatically fetch unstructured content like papers and news, build a queryable structured knowledge base, and support natural language retrieval.

Open sourcing InfiniteKV: a KV cache that files old tokens as 104-byte searchable records in RAM or on disk instead of deleting them. Mistral-7B answered from token 76,747, 2.3x past its trained window. Colab demo

Reddit r/LocalLLaMA

InfiniteKV is an open-source KV cache technique that compresses old tokens into 104-byte searchable records stored in RAM or on disk, enabling models to handle million-token contexts beyond their trained window without discarding data. Verified working with Mistral-7B and SmolLM2.

Do agents need a "brain" separate from their knowledge base?

Reddit r/AI_Agents

The author proposes a mental model where AI agents should maintain a separate memory layer (brain) that stores reusable understanding, distinct from their knowledge base (library), to avoid rediscovering the same information repeatedly.