Tag
This paper investigates whether verbalized evaluation awareness (VEA) in large reasoning models causally affects their behavior on safety, alignment, moral reasoning, and political opinion benchmarks. The authors find that VEA has limited behavioral impact, with near-zero effects from injecting VEA and small shifts from removing it, suggesting that high VEA rates should not be taken as strong evidence of strategic behavior or alignment tampering.