Tag
This paper extends refusal steering (activation-based jailbreaking) to Mixture-of-Experts LLMs, finding that MoE routing patterns do not inhibit steering, and proposes expert-aware methods that can suppress refusal behavior based on a single expert's output.