delta-rule

#delta-rule

Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

arXiv cs.CL ↗ · yesterday Cached

Proposes Erase-then-Delta Attention (EDA), a memory update rule for linear attention that decouples erase and write addresses to selectively suppress stale information before writing new content. Experiments on 2.5B dense and 25B MoE models demonstrate consistent gains in standard and long-context evaluations.

0 favorites 0 likes

#delta-rule

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Hugging Face Daily Papers ↗ · 2026-05-21 Cached

Gated DeltaNet-2 introduces separate erase and write gates for linear attention, achieving superior performance in long-context language modeling and retrieval tasks.

0 favorites 0 likes

#delta-rule

Δ-Mem: Efficient Online Memory for Large Language Models

Hacker News Top ↗ · 2026-05-16 Cached

Proposes delta-Mem, a lightweight online memory mechanism that uses a compact state matrix updated by delta-rule learning to improve long-context performance of frozen LLMs without full fine-tuning or context extension.

0 favorites 0 likes

#delta-rule

@dair_ai: // δ-mem: Efficient Online Memory for LLMs // One of the more elegant memory mechanisms I've seen this month. Most long…

X AI KOLs Following ↗ · 2026-05-13 Cached

The paper introduces δ-mem, a lightweight online memory mechanism that augments frozen LLMs with a compact associative memory state updated by delta-rule learning, achieving significant improvements on memory-heavy benchmarks without fine-tuning or context extension.

0 favorites 0 likes

delta-rule

Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Δ-Mem: Efficient Online Memory for Large Language Models

@dair_ai: // δ-mem: Efficient Online Memory for LLMs // One of the more elegant memory mechanisms I've seen this month. Most long…

Submit Feedback