activation-sparsity

Tag

Cards List
#activation-sparsity

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Hugging Face Daily Papers · 2026-05-26 Cached

RT-Lynx proposes using activation sparsity instead of weight sparsity to accelerate diffusion models, achieving up to 1.55× linear-layer speedup while maintaining generation quality, and is accepted at ICML 2026.

0 favorites 0 likes
#activation-sparsity

Bug or Feature^2: Weight Drift, Activation Sparsity, and Spikes

Hugging Face Daily Papers · 2026-05-17 Cached

This paper formally proves that training neural networks with asymmetric activation functions like ReLU, GELU, or SiLU causes weights to drift negative, leading to up to 90% activation sparsity. It also shows that squared activations like ReLU² improve performance but cause activation spikes, which can be fixed by clipping, with GELU² achieving the best validation loss.

0 favorites 0 likes
← Back to home

Submit Feedback