Tag
ConMoE proposes a train-free prototype remapping framework for Mixture-of-Experts (MoE) compression, which selects a subset of experts as reusable prototypes and deterministically remaps original expert calls to them, reducing memory usage without weight updates or fine-tuning.