@jiqizhixin: New from NVIDIA! You can edit a model’s compressed memory without scrambling what it already knows! Enter Gated DeltaNe…

X AI KOLs Timeline 05/22/26, 06:09 AM Papers

linear-attention model-editing memory gated-delta-net nvidia language-modeling

Summary

NVIDIA introduces Gated DeltaNet-2, a method for editing compressed model memory without catastrophic forgetting, using independent gates for erase and write operations. It outperforms existing models like Mamba-2 and Mamba-3 on language modeling and long-context tasks.

New from NVIDIA! You can edit a model’s compressed memory without scrambling what it already knows! Enter Gated DeltaNet-2. It separates the erase and write operations in linear attention using two independent gates – one for forgetting old info, another for adding new info. Outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across language modeling, commonsense reasoning, and retrieval – especially on long-context needle-in-a-haystack benchmarks.

Original Article

View Cached Full Text

Cached at: 05/22/26, 09:48 AM

New from NVIDIA!

You can edit a model’s compressed memory without scrambling what it already knows!

Enter Gated DeltaNet-2.

It separates the erase and write operations in linear attention using two independent gates – one for forgetting old info, another for adding new info.

Outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across language modeling, commonsense reasoning, and retrieval – especially on long-context needle-in-a-haystack benchmarks.

Similar Articles

Δ-Mem: Efficient Online Memory for Large Language Models

Hacker News Top

Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.

δ-mem: Efficient Online Memory for Large Language Models

Hugging Face Daily Papers

The paper introduces δ-mem, a lightweight memory mechanism that enhances large language models by augmenting a frozen attention backbone with a compact associative memory state. It demonstrates improved performance on memory-heavy benchmarks with minimal computational overhead.

@tom_doerr: Compresses deep learning models for faster inference https://github.com/NVIDIA/Model-Optimizer…

X AI KOLs Timeline

NVIDIA Model Optimizer is a library that compresses deep learning models using techniques like quantization, distillation, pruning, and speculative decoding to accelerate inference. It supports Hugging Face, PyTorch, and ONNX models and integrates with NVIDIA inference frameworks.

@BlinkDL_AI: Gated DeltaNet-2 is almost exactly RWKV-7's DPLR recurrence, not acknowledging the elephant in the room

X AI KOLs Following

Ali Hatamizadeh announces Gated DeltaNet-2, a new linear attention model that outperforms KDA and Mamba-3 at 1.3B scale; @BlinkDL_AI notes its recurrence is nearly identical to RWKV-7's DPLR.

@dair_ai: // δ-mem: Efficient Online Memory for LLMs // One of the more elegant memory mechanisms I've seen this month. Most long…