Dynamic Linear Attention

Hugging Face Daily Papers 06/09/26, 12:00 AM Papers

Summary

DLA introduces adaptive state merging and capacity-bounded memory modeling for multi-state linear attention, improving long-context LLM performance.

The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the quadratic complexity of standard attention, motivating the adoption of linear attention mechanisms with sub-quadratic cost. To improve representation capacity under long contexts, recent approaches organize memory in a multi-state manner. However, existing multi-state linear attention methods rely on fixed state merging policies that cannot adapt to dynamically varying token importance, irreversibly obscuring critical tokens and causing severe error accumulation over long sequences. To address this limitation, we propose DLA, a dynamic memory modeling framework for multi-state linear attention. DLA introduces (i) Information-Aware Dynamic State Merging, which adaptively determines state boundaries based on token-level information variation, preserving high-resolution representations around semantic transitions while aggressively summarizing stable regions, and (ii) Capacity-Bounded Memory Modeling, which maintains a fixed-size, chronologically ordered state cache by selectively merging adjacent low-information states to control memory growth with minimal information loss. We pre-train DLA on two different linear attention models and evaluate on 16 datasets across three categories. Experimental results demonstrate the superiority of DLA over state-of-the-art.

Original Article

View Cached Full Text

Cached at: 06/10/26, 05:45 AM

Paper page - Dynamic Linear Attention

Source: https://huggingface.co/papers/2606.10650

Abstract

DLA addresses limitations in long-context LLMs by introducing adaptive state merging and capacity-bounded memory modeling for improved multi-state linear attention.

The scalability ofLarge Language Models(LLMs) to long contexts is fundamentally constrained by the quadratic complexity ofstandard attention, motivating the adoption oflinear attention mechanismswith sub-quadratic cost. To improve representation capacity under long contexts, recent approaches organize memory in a multi-state manner. However, existingmulti-state linear attentionmethods rely on fixed state merging policies that cannot adapt to dynamically varyingtoken importance, irreversibly obscuring critical tokens and causing severe error accumulation over long sequences. To address this limitation, we propose DLA, a dynamic memory modeling framework formulti-state linear attention. DLA introduces (i)Information-Aware Dynamic State Merging, which adaptively determines state boundaries based on token-level information variation, preserving high-resolution representations around semantic transitions while aggressively summarizing stable regions, and (ii)Capacity-Bounded Memory Modeling, which maintains a fixed-size, chronologically ordered state cache by selectively merging adjacent low-information states to control memory growth with minimal information loss. We pre-train DLA on two different linear attention models and evaluate on 16 datasets across three categories. Experimental results demonstrate the superiority of DLA over state-of-the-art.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2606\.10650

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.10650 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.10650 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.10650 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Dynamic Linear Attention

Paper page - Dynamic Linear Attention

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Dynamic Linear Attention

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

Memory

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

Submit Feedback

Similar Articles

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention [P]

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference