linear-features

Tag

Cards List
#linear-features

Detecting and Controlling Sycophancy with Cascading Linear Features

arXiv cs.AI · yesterday Cached

Presents an iterative data generation pipeline to isolate cascading linear features responsible for sycophancy in language models, enabling detection, scoring, and steering with lower computational cost than baselines.

0 favorites 0 likes
← Back to home

Submit Feedback