Tag
DECO is a sparse MoE architecture that matches dense Transformer performance with only 20% activated experts and a 3x acceleration kernel, utilizing ReLU-based routing, learnable scaling, and the NormSiLU activation function.
Blog post surveys fast hyperbolic tangent approximations—Taylor, Padé, splines, and bit-level tricks—for neural-network and real-time audio use.