recurrent-attention

#recurrent-attention

Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

This paper explores how an exponentially decaying memory module from RAT+ can improve query-aware sparse inference methods for long-context language models, demonstrating consistent accuracy gains across various sparse budgets on needle-in-a-haystack tasks.

0 favorites 0 likes

recurrent-attention

Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity

Submit Feedback