Tag
This paper analyzes the routing behavior of Mixtral 8x7B-Instruct under benign and harmful prompts using activation-based and gradient-based signals. It finds that safety-relevant routing is subtle, depth-dependent, and distributed rather than dominated by a fixed set of experts.