overton-window

#overton-window

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper introduces a red-teaming framework that measures the 'Overton Window' of political opinions open-source LLMs can express and evaluates how simple jailbreaks expand that range, finding systematic left-leaning biases and vulnerabilities across 30+ models.

0 favorites 0 likes

overton-window

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Submit Feedback