nested-subnetworks

#nested-subnetworks

FlexMoE: One-for-All Nested Intra-Expert Pruning for MoE Language Models

arXiv cs.LG ↗ · 5d ago Cached

FlexMoE proposes a one-for-all nested intra-expert pruning method for MoE language models, enabling multiple deployable subnetworks from a single training run with minimal performance loss.

0 favorites 0 likes

nested-subnetworks

FlexMoE: One-for-All Nested Intra-Expert Pruning for MoE Language Models

Submit Feedback