dialect

Tag

Cards List
#dialect

could refusal layers be masking dialect-conditioned safety failures in MoE models [d]

Reddit r/MachineLearning · 2026-05-18

Tests on Qwen3.5-35B-A3B show that AAVE-coded prompts cause MoE models to respond differently, with refusal layers masking dialect-conditioned safety failures that become visible when refusal is weakened.

0 favorites 0 likes
← Back to home

Submit Feedback