sparse-activation

Tag

Cards List
#sparse-activation

Generic Expert Coverage for Pruning SparseMixture-of-Experts Language Models

arXiv cs.AI · 2d ago Cached

Proposes Generic TB-Coverage, a coverage-aware expert pruning method for sparse Mixture-of-Experts language models that uses only generic text corpora for calibration and preserves cross-corpus expert coverage, improving accuracy and reducing perplexity degradation.

0 favorites 0 likes
← Back to home

Submit Feedback