activation-function

#activation-function

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Hugging Face Daily Papers ↗ · 2d ago Cached

DECO is a sparse MoE architecture that matches dense Transformer performance with only 20% activated experts and a 3x acceleration kernel, utilizing ReLU-based routing, learnable scaling, and the NormSiLU activation function.

0 favorites 0 likes

#activation-function

Approximating Hyperbolic Tangent

Hacker News Top ↗ · 2026-04-22 Cached

Blog post surveys fast hyperbolic tangent approximations—Taylor, Padé, splines, and bit-level tricks—for neural-network and real-time audio use.

0 favorites 0 likes

activation-function

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

Approximating Hyperbolic Tangent

Submit Feedback