Tag
Sigma-Branch restructures pretrained dense networks into a hierarchical binary tree with a shared backbone, routers, and specialized leaves, reducing per-inference active parameters by 58–60% while staying within 1.72 pp of baseline accuracy on CIFAR-100, ImageNet-1K, and ModelNet40.
Discussion on the limit of active parameters in Mixture-of-Experts (MoE) models, questioning whether there is a cap on active parameter count beyond which quality doesn't improve.