Tag
Proposes Confidence-Aware SwiGLU (κ-SwiGLU) that adjusts expert gate sharpness in Mixture-of-Experts models based on token-level routing confidence, improving performance with minimal computational overhead.