@jiqizhixin: New from NVIDIA! You can edit a model’s compressed memory without scrambling what it already knows! Enter Gated DeltaNe…

X AI KOLs Timeline Papers

Summary

NVIDIA introduces Gated DeltaNet-2, a method for editing compressed model memory without catastrophic forgetting, using independent gates for erase and write operations. It outperforms existing models like Mamba-2 and Mamba-3 on language modeling and long-context tasks.

New from NVIDIA! You can edit a model’s compressed memory without scrambling what it already knows! Enter Gated DeltaNet-2. It separates the erase and write operations in linear attention using two independent gates – one for forgetting old info, another for adding new info. Outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across language modeling, commonsense reasoning, and retrieval – especially on long-context needle-in-a-haystack benchmarks.
Original Article
View Cached Full Text

Cached at: 05/22/26, 09:48 AM

New from NVIDIA!

You can edit a model’s compressed memory without scrambling what it already knows!

Enter Gated DeltaNet-2.

It separates the erase and write operations in linear attention using two independent gates – one for forgetting old info, another for adding new info.

Outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across language modeling, commonsense reasoning, and retrieval – especially on long-context needle-in-a-haystack benchmarks.

Similar Articles

Δ-Mem: Efficient Online Memory for Large Language Models

Hacker News Top

Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.

δ-mem: Efficient Online Memory for Large Language Models

Hugging Face Daily Papers

The paper introduces δ-mem, a lightweight memory mechanism that enhances large language models by augmenting a frozen attention backbone with a compact associative memory state. It demonstrates improved performance on memory-heavy benchmarks with minimal computational overhead.