expert-parallelism

#expert-parallelism

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv cs.LG ↗ · 2026-05-08 Cached

MACS is a training-free inference framework that mitigates the straggler effect in expert parallelism for multimodal MoE MLLMs by introducing entropy-weighted load and dynamic modality-adaptive capacity mechanisms.

0 favorites 0 likes

#expert-parallelism

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

Federation of Experts (FoE) restructures mixture-of-experts blocks into clusters that process KV heads independently, eliminating inter-node communication bottlenecks and improving inference throughput and latency by up to 5.2x while maintaining generation quality.

0 favorites 0 likes

expert-parallelism

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models

Submit Feedback