local-linear-attention

#local-linear-attention

@maximelabonne: Parallax is a parametrized form of Local Linear Attention that drops the numerical solvers and matches FA 2/3 on decode…

X AI KOLs Following ↗ · 4d ago Cached

Parallax is a new parametrized form of Local Linear Attention that eliminates numerical solvers and matches FlashAttention 2/3 in decoding. Its effectiveness depends on the optimizer, working with Muon but not AdamW, highlighting the role of optimizer geometry.

0 favorites 0 likes

#local-linear-attention

Parallax: Parameterized Local Linear Attention for Language Modeling

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

Introduces Parallax, a parameterized local linear attention mechanism with hardware-aware optimization that improves LLM pretraining efficiency and performance, achieving Pareto improvements at 0.6B and 1.7B scales.

0 favorites 0 likes

local-linear-attention

@maximelabonne: Parallax is a parametrized form of Local Linear Attention that drops the numerical solvers and matches FA 2/3 on decode…

Parallax: Parameterized Local Linear Attention for Language Modeling

Submit Feedback