persona-vectors

Tag

Cards List
#persona-vectors

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

arXiv cs.AI · 2026-05-22 Cached

This paper investigates whether off-the-shelf persona steering vectors can reduce sycophancy in large language models, finding they achieve 68-98% of the effect of targeted Contrastive Activation Addition (CAA) without requiring sycophancy-specific training data, and that sycophancy is better understood as a persona-level property.

0 favorites 0 likes
← Back to home

Submit Feedback