mixtral

Tag

Cards List
#mixtral

Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts

arXiv cs.AI · 2026-05-26 Cached

This paper analyzes the routing behavior of Mixtral 8x7B-Instruct under benign and harmful prompts using activation-based and gradient-based signals. It finds that safety-relevant routing is subtle, depth-dependent, and distributed rather than dominated by a fixed set of experts.

0 favorites 0 likes
← Back to home

Submit Feedback