Tag
NVIDIA introduces Gated DeltaNet-2, a method for editing compressed model memory without catastrophic forgetting, using independent gates for erase and write operations. It outperforms existing models like Mamba-2 and Mamba-3 on language modeling and long-context tasks.
Ali Hatamizadeh announces Gated DeltaNet-2, a new linear attention model that outperforms KDA and Mamba-3 at 1.3B scale; @BlinkDL_AI notes its recurrence is nearly identical to RWKV-7's DPLR.