δ-mem: Efficient Online Memory for Large Language Models
Summary
The paper introduces δ-mem, a lightweight memory mechanism that enhances large language models by augmenting a frozen attention backbone with a compact associative memory state. It demonstrates improved performance on memory-heavy benchmarks with minimal computational overhead.
View Cached Full Text
Cached at: 05/13/26, 04:11 AM
Paper page - δ-mem: Efficient Online Memory for Large Language Models
Source: https://huggingface.co/papers/2605.12357
Abstract
A lightweight memory mechanism called δ-mem enhances large language models by augmenting a frozen attention backbone with a compact associative memory state that provides low-rank corrections to attention computations.
Large language modelsincreasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweightmemory mechanismthat augments afrozen full-attention backbonewith a compact online state ofassociative memory. δ-mem compresses past information into a fixed-size state matrix updated bydelta-rule learning, and uses its readout to generate low-rank corrections to the backbone’sattention computationduring generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains onmemory-heavy benchmarks, reaching 1.31times onMemoryAgentBenchand 1.20times onLoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled withattention computation, without full fine-tuning, backbone replacement, or explicit context extension.
View arXiv pageView PDFGitHub26Add to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.12357 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.12357 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.12357 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Δ-Mem: Efficient Online Memory for Large Language Models
Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.
@dair_ai: // δ-mem: Efficient Online Memory for LLMs // One of the more elegant memory mechanisms I've seen this month. Most long…
The paper introduces δ-mem, a lightweight online memory mechanism that augments frozen LLMs with a compact associative memory state updated by delta-rule learning, achieving significant improvements on memory-heavy benchmarks without fine-tuning or context extension.
@dair_ai: // Memory as a Model // The paper augments any LLM with a separate trained memory model that stores, retrieves, and int…
MeMo introduces a modular memory model that augments any LLM to store, retrieve, and integrate new knowledge without retraining or catastrophic forgetting. It outperforms RAG-based methods on benchmarks like BrowseComp-Plus, NarrativeQA, and MuSiQue.
StageMem: Lifecycle-Managed Memory for Language Models
StageMem proposes a lifecycle-managed memory framework for language models that organizes memory into transient, working, and durable stages with explicit confidence and strength metrics, treating memory as a stateful process rather than a static store to better manage retention and forgetting under bounded capacity.
SimpleMem: Efficient Lifelong Memory for LLM Agents
Introduces SimpleMem, an efficient memory framework for LLM agents that uses semantic lossless compression to improve accuracy and reduce token consumption, achieving 26.4% F1 improvement and up to 30x reduction in inference-time token usage.