Tag
MACS is a training-free inference framework that mitigates the straggler effect in expert parallelism for multimodal MoE MLLMs by introducing entropy-weighted load and dynamic modality-adaptive capacity mechanisms.
Federation of Experts (FoE) restructures mixture-of-experts blocks into clusters that process KV heads independently, eliminating inter-node communication bottlenecks and improving inference throughput and latency by up to 5.2x while maintaining generation quality.