persona-steering

Tag

Cards List
#persona-steering

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Hugging Face Daily Papers · 4d ago Cached

This paper identifies KV-cache contamination as a failure mode for activation steering in dialogue and proposes GCAD, a method that extracts steering signals from prompt contributions and applies token-level gating to improve long-horizon coherence, achieving substantial gains on multi-turn benchmarks.

0 favorites 0 likes
#persona-steering

Beyond Static Personas: Situational Personality Steering for Large Language Models

arXiv cs.CL · 2026-04-20 Cached

This paper introduces IRiS, a training-free framework for situational personality steering in LLMs that moves beyond static persona modeling by identifying and leveraging situation-dependent persona neurons. The approach demonstrates that LLM behavior varies contextually and proposes neuron-based identification, retrieval, and weighted steering methods validated on PersonalityBench and a new SPBench benchmark.

0 favorites 0 likes
← Back to home

Submit Feedback