Tag
This paper introduces a red-teaming framework that measures the 'Overton Window' of political opinions open-source LLMs can express and evaluates how simple jailbreaks expand that range, finding systematic left-leaning biases and vulnerabilities across 30+ models.