Tag
This paper identifies KV-cache contamination as a failure mode for activation steering in dialogue and proposes GCAD, a method that extracts steering signals from prompt contributions and applies token-level gating to improve long-horizon coherence, achieving substantial gains on multi-turn benchmarks.
This paper introduces IRiS, a training-free framework for situational personality steering in LLMs that moves beyond static persona modeling by identifying and leveraging situation-dependent persona neurons. The approach demonstrates that LLM behavior varies contextually and proposes neuron-based identification, retrieval, and weighted steering methods validated on PersonalityBench and a new SPBench benchmark.