Tag
This paper proposes DLA, a dynamic memory modeling framework for multi-state linear attention that adaptively merges states based on token information variation and maintains a fixed-size state cache, enabling better long-context representation without the quadratic complexity of standard attention.
DLA introduces adaptive state merging and capacity-bounded memory modeling for multi-state linear attention, improving long-context LLM performance.