zero-expert

Tag

Cards List
#zero-expert

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Hugging Face Daily Papers · 2026-05-18 Cached

ZEDA is a low-cost framework that converts post-trained static MoE models into dynamic ones by injecting zero-output experts and using self-distillation, achieving over 50% expert FLOP reduction with marginal accuracy loss on benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback