zero-expert

#zero-expert

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

ZEDA is a low-cost framework that converts post-trained static MoE models into dynamic ones by injecting zero-output experts and using self-distillation, achieving over 50% expert FLOP reduction with marginal accuracy loss on benchmarks.

0 favorites 0 likes

zero-expert

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Submit Feedback