attention-alternative

#attention-alternative

An Update on Matrix Recurrent Units, an Attention Alternative [R]

Reddit r/MachineLearning ↗ · 2026-06-21

An update on Matrix Recurrent Units (MRU), a linear-time attention alternative. The author explores methods to stabilize training, finding that orthogonal matrices underperform while LDU factorization works best, and shows MRU underperforms transformers on larger datasets like TinyStories.

0 favorites 0 likes

attention-alternative

An Update on Matrix Recurrent Units, an Attention Alternative [R]

Submit Feedback