erase-then-delta

Tag

Cards List
#erase-then-delta

Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

arXiv cs.CL · yesterday Cached

Proposes Erase-then-Delta Attention (EDA), a memory update rule for linear attention that decouples erase and write addresses to selectively suppress stale information before writing new content. Experiments on 2.5B dense and 25B MoE models demonstrate consistent gains in standard and long-context evaluations.

0 favorites 0 likes
← Back to home

Submit Feedback