expert-parallelism

Tag

Cards List
#expert-parallelism

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv cs.LG · 2026-05-08 Cached

MACS is a training-free inference framework that mitigates the straggler effect in expert parallelism for multimodal MoE MLLMs by introducing entropy-weighted load and dynamic modality-adaptive capacity mechanisms.

0 favorites 0 likes
#expert-parallelism

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models

Hugging Face Daily Papers · 2026-05-07 Cached

Federation of Experts (FoE) restructures mixture-of-experts blocks into clusters that process KV heads independently, eliminating inter-node communication bottlenecks and improving inference throughput and latency by up to 5.2x while maintaining generation quality.

0 favorites 0 likes
← Back to home

Submit Feedback