routing-sparsity

Tag

Cards List
#routing-sparsity

Geometric Asymmetry in MoE Specialization: Functional Decorrelation and Representational Overlap

arXiv cs.LG · 2026-05-19 Cached

This paper introduces a Jacobian-PCA-Grassmann framework to analyze the geometric structure of expert specialization in Mixture-of-Experts (MoE) Transformers. It finds that experts exhibit strong functional decorrelation while their representations overlap, and that routing sparsity significantly influences this geometry.

0 favorites 0 likes
← Back to home

Submit Feedback