@BlinkDL_AI: Gated DeltaNet-2 is almost exactly RWKV-7's DPLR recurrence, not acknowledging the elephant in the room

X AI KOLs Following Papers

Summary

Ali Hatamizadeh announces Gated DeltaNet-2, a new linear attention model that outperforms KDA and Mamba-3 at 1.3B scale; @BlinkDL_AI notes its recurrence is nearly identical to RWKV-7's DPLR.

Gated DeltaNet-2 is almost exactly RWKV-7's DPLR recurrence, not acknowledging the elephant in the room ๐Ÿ™‚
Original Article
View Cached Full Text

Cached at: 05/23/26, 12:05 PM

Gated DeltaNet-2 is almost exactly RWKV-7โ€™s DPLR recurrence, not acknowledging the elephant in the room ๐Ÿ™‚

Ali Hatamizadeh (@ahatamiz1): Gated DeltaNet-2 is here. ๐Ÿš€

๐Ÿ”ฅ New paper: Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Gated DeltaNet-2 outperforms KDA and Mamba-3, the latest and best recurrent architectures, head to head at 1.3B. ๐Ÿ†

๐Ÿ’ก Hereโ€™s the idea behind it:

Linear attention

Similar Articles

๐ƒ๐ž๐ฅ๐ญ๐š ๐€๐ญ๐ญ๐ž๐ง๐ญ๐ข๐จ๐ง ๐‘๐ž๐ฌ๐ข๐๐ฎ๐š๐ฅ๐ฌ [R]

Reddit r/MachineLearning

Delta Attention Residuals is a drop-in upgrade to residual connections that routes over deltas instead of cumulative hidden states, achieving sharper cross-layer routing and 1.7-8.2% lower perplexity at scales up to 7.6B parameters, and enabling fine-tuning of pretrained models like Qwen3-0.6B with negligible overhead.