δ-mem: Efficient Online Memory for Large Language Models

Hugging Face Daily Papers Papers

Summary

The paper introduces δ-mem, a lightweight memory mechanism that enhances large language models by augmenting a frozen attention backbone with a compact associative memory state. It demonstrates improved performance on memory-heavy benchmarks with minimal computational overhead.

Large language models increasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweight memory mechanism that augments a frozen full-attention backbone with a compact online state of associative memory. δ-mem compresses past information into a fixed-size state matrix updated by delta-rule learning, and uses its readout to generate low-rank corrections to the backbone's attention computation during generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains on memory-heavy benchmarks, reaching 1.31times on MemoryAgentBench and 1.20times on LoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled with attention computation, without full fine-tuning, backbone replacement, or explicit context extension.
Original Article
View Cached Full Text

Cached at: 05/13/26, 04:11 AM

Paper page - δ-mem: Efficient Online Memory for Large Language Models

Source: https://huggingface.co/papers/2605.12357

Abstract

A lightweight memory mechanism called δ-mem enhances large language models by augmenting a frozen attention backbone with a compact associative memory state that provides low-rank corrections to attention computations.

Large language modelsincreasingly need to accumulate and reuse historical information in long-term assistants and agent systems. Simply expanding the context window is costly and often fails to ensure effective context utilization. We propose δ-mem, a lightweightmemory mechanismthat augments afrozen full-attention backbonewith a compact online state ofassociative memory. δ-mem compresses past information into a fixed-size state matrix updated bydelta-rule learning, and uses its readout to generate low-rank corrections to the backbone’sattention computationduring generation. With only an 8times8 online memory state, δ-mem improves the average score to 1.10times that of the frozen backbone and 1.15times that of the strongest non-δ-mem memory baseline. It achieves larger gains onmemory-heavy benchmarks, reaching 1.31times onMemoryAgentBenchand 1.20times onLoCoMo, while largely preserving general capabilities. These results show that effective memory can be realized through a compact online state directly coupled withattention computation, without full fine-tuning, backbone replacement, or explicit context extension.

View arXiv pageView PDFGitHub26Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.12357 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.12357 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.12357 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Δ-Mem: Efficient Online Memory for Large Language Models

Hacker News Top

Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.

StageMem: Lifecycle-Managed Memory for Language Models

arXiv cs.CL

StageMem proposes a lifecycle-managed memory framework for language models that organizes memory into transient, working, and durable stages with explicit confidence and strength metrics, treating memory as a stateful process rather than a static store to better manage retention and forgetting under bounded capacity.

SimpleMem: Efficient Lifelong Memory for LLM Agents

Papers with Code Trending

Introduces SimpleMem, an efficient memory framework for LLM agents that uses semantic lossless compression to improve accuracy and reduce token consumption, achieving 26.4% F1 improvement and up to 30x reduction in inference-time token usage.