load-balancing

#load-balancing

$\phi$-Balancing for Mixture-of-Experts Training

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper proposes φ-balancing, a principled framework for load balancing in Mixture-of-Experts models that directly targets population-level expert balance using convex duality and mirror descent, achieving more stable expert utilization and outperforming prior methods on reasoning and code generation benchmarks.

0 favorites 0 likes

#load-balancing

@Akintola_steve: https://x.com/Akintola_steve/status/2055620856802357587

X AI KOLs Timeline ↗ · 2026-05-16 Cached

A practical blueprint for designing a backend system capable of handling 1 million concurrent users, covering architecture decisions like language selection, load balancing, database sharding, multi-layer caching, and resilience patterns.

0 favorites 0 likes

#load-balancing

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

arXiv cs.LG ↗ · 2026-05-08 Cached

MACS is a training-free inference framework that mitigates the straggler effect in expert parallelism for multimodal MoE MLLMs by introducing entropy-weighted load and dynamic modality-adaptive capacity mechanisms.

0 favorites 0 likes

load-balancing

$\phi$-Balancing for Mixture-of-Experts Training

@Akintola_steve: https://x.com/Akintola_steve/status/2055620856802357587

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

Submit Feedback