Key-Value Means

Hugging Face Daily Papers 05/11/26, 12:00 AM Papers

attention-mechanism transformer rnn key-value-means long-context subquadratic block-recurrence

Summary

Key-Value Means (KVM) is a novel attention mechanism that combines the strengths of transformers and RNNs with controllable computational complexity and memory usage. It supports fixed-size or growing state, offers subquadratic prefill time and sublinear state growth, and can be implemented without custom kernels.

We present Key-Value Means ("KVM"), a novel block-recurrence for attention that can accommodate either fixed-size or growing state. Equipping a strong transformer baseline with fixed-size KVM attention layers yields a strong O(N) chunked RNN, while adding only an insignificant number of new parameters. We train a transformer with a growable KVM cache and show it performs competitively on long-context tests with only subquadratic prefill time and sublinear state growth. KVM is implementable with standard operations and without custom kernels, and supports chunk-wise parallelizable training and prefill. It provides many of the benefits of both traditional transformers (expandable context memory, chunk-wise parallelizable training and prefill) and linear RNNs in a single unified package. It can be used on every layer, saving KV-cache memory, and allowing a continuous range of choices of prefill time complexity between O(N) and O(N^2). It can also be implemented in a hybrid solution in tandem with LRNN layers in place of traditional attention, to supplement the LRNN with improved sublinear memory growth context length usage and long context decoding. We release our code at https://github.com/recursal/KVM-paper and trained models at https://huggingface.co/collections/recursal/key-value-means under the Apache 2.0 license.

Original Article

View Cached Full Text

Cached at: 05/12/26, 02:52 PM

Paper page - Key-Value Means

Source: https://huggingface.co/papers/2605.09877

Abstract

Key-Value Means introduces a novel attention mechanism that combines transformer and RNN advantages with controllable computational complexity and memory usage.

We present Key-Value Means (“KVM”), a novelblock-recurrenceforattentionthat can accommodate eitherfixed-sizeorgrowing state. Equipping a strongtransformerbaseline withfixed-sizeKVMattentionlayers yields a strongO(N)chunked RNN, while adding only an insignificant number of new parameters. We train atransformerwith a growable KVM cache and show it performs competitively on long-context tests with onlysubquadratic prefill timeandsublinear state growth. KVM is implementable with standard operations and without custom kernels, and supportschunk-wise parallelizable trainingandprefill. It provides many of the benefits of both traditionaltransformers (expandable context memory,chunk-wise parallelizable trainingandprefill) and linear RNNs in a single unified package. It can be used on every layer, savingKV-cachememory, and allowing a continuous range of choices ofprefilltime complexity betweenO(N)and O(N^2). It can also be implemented in ahybrid solutionin tandem withLRNNlayers in place of traditionalattention, to supplement theLRNNwith improved sublinear memory growth context length usage and long context decoding. We release our code at https://github.com/recursal/KVM-paper and trained models at https://huggingface.co/collections/recursal/key-value-means under the Apache 2.0 license.

View arXiv page View PDF GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2605\.09877

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.09877 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.09877 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.09877 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Key-Value Means

Paper page - Key-Value Means

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

@jiqizhixin: What if your AI’s memory didn’t have to balloon with every extra sentence? University of Oxford, Technion, AITHYRA, and…

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

@TheTuringPost: Why KV cache is one of the main reasons LLMs are fast? KV cache is what connects attention mechanism with generation st…

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression

Submit Feedback

Similar Articles

@jiqizhixin: What if your AI’s memory didn’t have to balloon with every extra sentence? University of Oxford, Technion, AITHYRA, and…

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

@TheTuringPost: Why KV cache is one of the main reasons LLMs are fast? KV cache is what connects attention mechanism with generation st…

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression