Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Hugging Face Daily Papers 05/11/26, 12:00 AM Papers

Summary

This paper identifies KV-cache contamination as a failure mode for activation steering in dialogue and proposes GCAD, a method that extracts steering signals from prompt contributions and applies token-level gating to improve long-horizon coherence, achieving substantial gains on multi-turn benchmarks.

Activation steering controls language model behavior by adding directions to internal representations at inference time, but standard residual-stream steering can fail in stateful dialogue. We identify KV-cache contamination as a key failure mode: steered token states are stored and repeatedly reused, turning a local perturbation into cumulative coherence degradation. To address this challenge, we propose Gated Cropped Attention-Delta steering (GCAD), which extracts steering signals from system-prompt contributions to self-attention and applies them with token-level gating. Across persona-steering experiments, GCAD preserves trait control while substantially improving long-horizon coherence. On the main multi-turn benchmark, GCAD improves average coherence drift from -18.6 to -1.9 and raises turn-10 trait expression from 78.0 to 93.1. These results suggest that activation steering becomes more reliable when interventions follow the prompt-mediated pathways that models already use for behavioral control.

Original Article

View Cached Full Text

Cached at: 05/12/26, 02:52 PM

Paper page - Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Source: https://huggingface.co/papers/2605.10664

Abstract

Activation steering in language models suffers from KV-cache contamination in dialogue settings, which GCAD addresses by extracting steering signals from prompt contributions and applying token-level gating to improve long-horizon coherence.

Activation steeringcontrols language model behavior by adding directions to internal representations at inference time, but standardresidual-stream steeringcan fail in stateful dialogue. We identifyKV-cache contaminationas a key failure mode: steered token states are stored and repeatedly reused, turning a local perturbation into cumulative coherence degradation. To address this challenge, we propose Gated Cropped Attention-Delta steering (GCAD), which extracts steering signals from system-prompt contributions toself-attentionand applies them withtoken-level gating. Acrosspersona-steeringexperiments, GCAD preserves trait control while substantially improving long-horizon coherence. On the main multi-turn benchmark, GCAD improves averagecoherence driftfrom -18.6 to -1.9 and raisesturn-10 trait expressionfrom 78.0 to 93.1. These results suggest thatactivation steeringbecomes more reliable when interventions follow the prompt-mediated pathways that models already use for behavioral control.

View arXiv page View PDF GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2605\.10664

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.10664 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.10664 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.10664 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Paper page - Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts

Steered LLM Activations are Non-Surjective

Submit Feedback

Similar Articles

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts

Steered LLM Activations are Non-Surjective