overton-window

Tag

Cards List
#overton-window

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

arXiv cs.CL · 2026-05-25 Cached

This paper introduces a red-teaming framework that measures the 'Overton Window' of political opinions open-source LLMs can express and evaluates how simple jailbreaks expand that range, finding systematic left-leaning biases and vulnerabilities across 30+ models.

0 favorites 0 likes
← Back to home

Submit Feedback