As X, Do Y: How Persona and Task Combine in Instruction-Tuned LLMs

arXiv cs.CL 05/25/26, 04:00 AM Papers
instruction-tuning persona task residual-stream mechanistic-interpretability llm additive-composition
Summary
This paper investigates how instruction-tuned LLMs combine persona and task specifications in the residual stream, finding that near answer formation the combination is approximately additive, enabling substitution with minimal KL divergence, but this additive regime does not account for the full multi-token generation mechanism.
arXiv:2605.23147v1 Announce Type: new Abstract: Role prompts of the form As X, do Y admit a clean linear decomposition at one specific site in the residual stream: the prompt-to-answer transition -- the last prompt token together with the first two generated tokens -- in an early/mid layer band. There, persona and task contribute through partially orthogonal additive directions. Forming a pure persona effect $\Delta_X$, a pure task effect $\Delta_Y$, and substituting $h_{BB} + \Delta_X + \Delta_Y$ for the clean residual yields downstream output within a small KL of clean on Gemma-2-2B-IT and Qwen-2.5-\{1.5B, 3B\}-Instruct, across a 12-cell short grid and a 48-cell long-persona grid, with persona-specific behavioral markers preserved. The natural inference from this additive structure is that the role prompt can be compressed into a single cached residual vector. \emph{We show it cannot.} Injecting the cached additive prediction -- or even the oracle clean residual $h_{XY}$ -- into a baseline host prompt with the persona text removed does not approach the clean long-persona target, at one site or at many layers. Persona-conditioned multi-token generation flows through attention back to the persona-text positions throughout the prompt, which no residual at one site reproduces. Local additivity in the residual stream does not imply prompt compressibility. The additive structure at the prompt-to-answer transition supports interpretability and fine-grained steering of persona or task contributions; persona-conditioned behavior across the full continuation depends on a distributed prompt/KV mechanism that local activation arithmetic does not displace.
Original Article
View Cached Full Text
Cached at: 05/25/26, 08:59 AM
# As X, Do Y: How Persona and Task Combine in Instruction-Tuned LLMs
Source: [https://arxiv.org/html/2605.23147](https://arxiv.org/html/2605.23147)
###### Abstract

Role prompts of the form*“As X, do Y”*combine a persona specificationXXwith a task specificationYYat inference time\. We characterize where, in the residual stream of an instruction\-tuned language model, this combination is causally simple enough to be approximated additively\. For each persona\-task pair we compare four prompts – baseline, persona\-only, task\-only, and persona\-plus\-task – and form a pure persona effectΔX\\Delta\_\{X\}, a pure task effectΔY\\Delta\_\{Y\}, and an interaction termInter=ΔXY−ΔX−ΔY\\mathrm\{Inter\}=\\Delta\_\{XY\}\-\\Delta\_\{X\}\-\\Delta\_\{Y\}\.

Near answer formation – the last prompt position together with the first two generated tokens –ΔXY\\Delta\_\{XY\}is well approximated byΔX\+ΔY\\Delta\_\{X\}\+\\Delta\_\{Y\}, and substituting the additive prediction for the clean residual yields downstream output within a small KL of clean at an early/mid layer band\. The pattern holds on Gemma\-2\-2B\-IT and on Qwen\-2\.5\-1\.5B, 3B\-Instruct, on a 12\-cell short grid and on a 48\-cell long\-persona grid\. A behavioral check – scoring persona\-specific markers in the 80\-token continuation – confirms that this local distributional fidelity carries persona\-conditioned output content\.

The same experiments mark a boundary\. Substituting the cached additive prediction into a baseline host prompt without the persona text recovers only a small fraction of the host\-to\-target KL gap, and multi\-layer or oracle substitution do not change this materially\. The persona’s contribution to multi\-token generation is distributed across the prompt and KV cache; the local additive regime accounts for the next\-step composition, not the full mechanism\.

## 1Introduction

Prompts of the form*“As Warren Buffett, give advice”*or*“As a senior software engineer, review this design”*are routine, and instruction\-tuned models handle them robustly\. The online computation that combines a persona specificationXXwith a task specificationYYin the residual stream is less well understood\. This paper asks where, along the layer and position axes, that combination is causally simple enough to be approximated additively\.

For each persona\-task pair we compare four prompts – baseline\-baseline \(BBBB\), persona\-only \(XBXB\), task\-only \(BYBY\), and persona\-plus\-task \(XYXY\) – and at each layer and position form

ΔX\\displaystyle\\Delta\_\{X\}=hXB−hBB,\\displaystyle=h\_\{XB\}\-h\_\{BB\},ΔY\\displaystyle\\Delta\_\{Y\}=hBY−hBB,\\displaystyle=h\_\{BY\}\-h\_\{BB\},\(1\)ΔXY\\displaystyle\\Delta\_\{XY\}=hXY−hBB,\\displaystyle=h\_\{XY\}\-h\_\{BB\},Inter\\displaystyle\\mathrm\{Inter\}=ΔXY−ΔX−ΔY\.\\displaystyle=\\Delta\_\{XY\}\-\\Delta\_\{X\}\-\\Delta\_\{Y\}\.\(2\)The empirical questions follow directly: isΔXY\\Delta\_\{XY\}aligned withΔX\+ΔY\\Delta\_\{X\}\+\\Delta\_\{Y\}? AreΔX\\Delta\_\{X\}andΔY\\Delta\_\{Y\}separated? When we substitutehBB\+ΔX\+ΔYh\_\{BB\}\+\\Delta\_\{X\}\+\\Delta\_\{Y\}for the cleanXYXYresidual, does downstream output stay close to clean? The answers turn out to depend sharply on layer and position\. They are affirmative near answer formation – the last prompt positionplastp\_\{\\text\{last\}\}and the first two generated tokensg1,g2g\_\{1\},g\_\{2\}, in an early/mid layer band – and they degrade outside that region\. The pattern holds across Gemma\-2 and Qwen\-2\.5 instruction\-tuned models and across a 48\-cell grid of long\-form personas and diverse tasks\.

The same intervention machinery exposes a sharp limit\. Substituting the cached additive prediction atplastp\_\{\\text\{last\}\}into a baseline host prompt – with the persona text removed – does not approach the clean long\-persona target, and clamping the substitution at multiple layers does not help\. The persona’s role in multi\-token generation is distributed across the prompt and KV cache; the local additive regime captures the next\-step composition, not the full mechanism\.

## 2Related Work

This paper draws on four lines of prior work\.

#### Task and function vectors from contrastive prompts\.

The closest methodological precedent is the line of work that constructs task representations from differences between contrastive prompts\. In\-context learning creates task vectors that can be transplanted across prompts\[[2](https://arxiv.org/html/2605.23147#bib.bib6)\], and function vectors formalize this as averaged contrastive activation differences that causally drive task\-conditioned behavior\[[8](https://arxiv.org/html/2605.23147#bib.bib5)\]\. OurΔX\\Delta\_\{X\},ΔY\\Delta\_\{Y\},ΔXY\\Delta\_\{XY\}are residual\-stream analogs of this construction, but the focus of this paper is compositional structure \(how persona and task combine\) rather than single\-axis transplantation\.

#### Activation\-space steering and representation engineering\.

Activation Addition\[[9](https://arxiv.org/html/2605.23147#bib.bib2)\]and subsequent representation\-engineering work\[[10](https://arxiv.org/html/2605.23147#bib.bib3)\]show that activation differences can steer model behavior without weight optimization\. Task arithmetic\[[4](https://arxiv.org/html/2605.23147#bib.bib9)\]demonstrates that fine\-tuning deltas behave additively in weight space\. Our paper is adjacent in method but different in aim: we are not proposing a steering tool, we use activation substitution to characterize the compositional structure induced by role prompts\.

#### Soft prompts and prefix tuning\.

Prefix\-Tuning\[[6](https://arxiv.org/html/2605.23147#bib.bib7)\]and Prompt\-Tuning\[[5](https://arxiv.org/html/2605.23147#bib.bib8)\]show that a small number of learned continuous vectors prepended to a prompt can drive task\-conditioned behavior\. Our negative result in[section˜7](https://arxiv.org/html/2605.23147#S7)is informative against the most aggressive reading of our positive result: full prompt\-to\-vector replacement is closer to the soft\-prompt regime, which uses many positions and is learned rather than constructed by single\-shot residual arithmetic\.

#### Mechanistic editing and feature interpretation\.

ROME\-style work\[[7](https://arxiv.org/html/2605.23147#bib.bib1)\]ties factual behavior to localized weight\-space mechanisms and demonstrates surgical edits; subsequent work extends the linearity picture to relation decoding\[[3](https://arxiv.org/html/2605.23147#bib.bib10)\]\. Our paper does not reach that level of closure\. Instead, it provides a residual\-stream\-level account of where persona and task composition is causally simple enough to be approximated additively, and where that story stops\. Work on monosemanticity and feature extraction\[[1](https://arxiv.org/html/2605.23147#bib.bib4)\]suggests that high\-level behaviors sometimes decompose into interpretable latent directions; this paper does not attempt a component\-level ontology and argues that the first object here is a localized residual computation, not a catalog of components\.

## 3Setup

### 3\.1Models

The primary model isgoogle/gemma\-2\-2b\-it, run infloat32\. We use two Qwen instruction\-tuned models for cross\-model confirmation of the localized position result:Qwen/Qwen2\.5\-1\.5B\-InstructandQwen/Qwen2\.5\-3B\-Instruct, both run inbfloat16\. The choice ofbfloat16for the Qwen models is load\-bearing: infloat16, residual magnitudes overflow at the layers relevant here and decomposition statistics become NaN\.bfloat16preserves the dynamic range\. All measurements are causal\-intervention experiments on hidden states, not weight edits\. Generation is fully greedy throughout \(do\_sample=False,num\_beams=1\) so that conditions can be compared directly without averaging over sampling noise\.

### 3\.2Prompt grids

We use two prompt families\.

#### Short grid\.

The first grid contains 4 personas and 3 tasks:

- •Personas: Warren Buffett, Karl Marx, Yoda, Maya Angelou
- •Tasks: policy commentary on UBI, haiku about Monday mornings, book recommendation

With baseline persona*“a thoughtful person”*and baseline task*“Give advice to someone facing a difficult decision”*, this yields 12 non\-baseline\(X,Y\)\(X,Y\)cells\.

#### Long\-persona grid\.

To test broader prompt diversity, we use 8 multi\-sentence personas and 6 tasks:

- •Personas: engineer, counselor, founder, teacher, journalist, doctor, lawyer, chef
- •Tasks: architecture review, startup launch plan review, scheduling proposal review, UBI policy commentary, haiku, book recommendation

This gives 48 non\-baseline\(X,Y\)\(X,Y\)cells\.

### 3\.3Residual decomposition

For each\(X,Y\)\(X,Y\)pair, we build the usual2×22\\times 2prompt set:

BB,XB,BY,XY\.BB,\\;XB,\\;BY,\\;XY\.At a chosen layer and sequence position, the four hidden states induceΔX\\Delta\_\{X\},ΔY\\Delta\_\{Y\},ΔXY\\Delta\_\{XY\}, andInter\\mathrm\{Inter\}as defined above\.

We report three families of statistics:

- •directional alignmentcos⁡\(ΔXY,ΔX\+ΔY\)\\cos\(\\Delta\_\{XY\},\\Delta\_\{X\}\+\\Delta\_\{Y\}\),
- •subspace overlapcos⁡\(ΔX,ΔY\)\\cos\(\\Delta\_\{X\},\\Delta\_\{Y\}\),
- •interaction magnitude∥Inter∥/∥ΔXY∥\\lVert\\mathrm\{Inter\}\\rVert/\\lVert\\Delta\_\{XY\}\\rVert\.

### 3\.4Positions

We probe three positions:plastp\_\{\\text\{last\}\}\(the last prompt token\),g1g\_\{1\}\(the first generated token\), andg2g\_\{2\}\(the second generated token\)\. Atg1g\_\{1\}andg2g\_\{2\}we teacher\-force on the cleanXYXYcontinuation, so comparisons across conditions remain position\-aligned\.

### 3\.5Causal metric

At each probe position and layer, we substitute the additive prediction

hBB\+ΔX\+ΔYh\_\{BB\}\+\\Delta\_\{X\}\+\\Delta\_\{Y\}for the cleanXYXYhidden state and measure the downstream effect by 10\-token teacher\-forced KL against the cleanXYXYcontinuation\. Low KL means the additive approximation is causally sufficient for near\-term downstream behavior at that site\.

## 4Localized Additive Composition Near Answer Formation

We measure the2×22\\times 2decomposition at three probe positions –plastp\_\{\\text\{last\}\},g1g\_\{1\},g2g\_\{2\}– on the 12\-cell short grid\.[Tables˜1](https://arxiv.org/html/2605.23147#S4.T1)and[1](https://arxiv.org/html/2605.23147#S4.F1)summarize the result\.

The pattern is the same across positions and models: a low\-KL early/mid layer band, smoothly degrading later\. On Gemma\-2\-2B\-IT atplastp\_\{\\text\{last\}\}, median causal KL is0\.00040\.0004at layer 6,0\.0330\.033at layer 10,0\.3710\.371at layer 14,1\.3711\.371at layer 18\. Atg1g\_\{1\}the corresponding numbers are0\.0020\.002,0\.0050\.005,0\.0490\.049; atg2g\_\{2\},0\.0010\.001,0\.0130\.013,0\.0600\.060\. The cross\-model picture matches: at the best early layer, Qwen\-2\.5\-1\.5B has medians0\.030/0\.018/0\.0140\.030/0\.018/0\.014atplast/g1/g2p\_\{\\text\{last\}\}/g\_\{1\}/g\_\{2\}, and Qwen\-2\.5\-3B has0\.031/0\.021/0\.0210\.031/0\.021/0\.021\. The first generated tokens stay low\-KL further into the network thanplastp\_\{\\text\{last\}\}does, which is consistent with the additive composition being most legible exactly at the prompt\-to\-answer transition\.

The geometry tells the same story\. At Gemma\-2\-2B\-IT layer 14, mediancos⁡\(ΔXY,ΔX\+ΔY\)\\cos\(\\Delta\_\{XY\},\\Delta\_\{X\}\+\\Delta\_\{Y\}\)is0\.8740\.874atplastp\_\{\\text\{last\}\},0\.9350\.935atg1g\_\{1\}, and0\.8980\.898atg2g\_\{2\}, whilecos⁡\(ΔX,ΔY\)\\cos\(\\Delta\_\{X\},\\Delta\_\{Y\}\)remains substantially lower\.ΔX\\Delta\_\{X\}andΔY\\Delta\_\{Y\}are partially separable; the interaction term is non\-negligible in norm but does not rotateΔXY\\Delta\_\{XY\}away from the additive sum\.

Table 1:Median causal KL under additive substitution on the 12\-cell short grid, with 25th–75th percentile range in brackets\. The additive regime is localized to a small region near answer formation rather than confined to a single prompt position\.![Refer to caption](https://arxiv.org/html/2605.23147v1/x1.png)Figure 1:Median causal KL under additive substitution atplastp\_\{\\text\{last\}\}\(blue\),g1g\_\{1\}\(orange\), andg2g\_\{2\}\(green\), with 25th–75th percentile bands\. Top row: 12\-cell short grid on Gemma\-2\-2B\-IT and Qwen\-2\.5\-1\.5B\-Instruct\. Bottom row: 48\-cell long\-persona grid on the same two models\. Dotted gray line marks KL=0\.1=0\.1, a useful reference for “essentially clean” substitution\. The localized additive regime is visible as an early/mid\-layer band with low KL across all three positions, present on both models and on both grids\.The additive approximation thus identifies a region, not a point: an early/mid layer band, atplastp\_\{\\text\{last\}\}and the first two generated tokens\. The boundaries of this region – in particular how it varies with model scale and depth – become the natural object for the rest of the paper\.

## 5Robustness to Persona and Task Diversity

The 12\-cell grid is convenient but small\. We rerun the same localized analysis on an 8\-persona×\\times6\-task grid \(48 cells\) that includes multi\-sentence personas \(engineer, counselor, founder, teacher, journalist, doctor, lawyer, chef\) and a mix of review, planning, policy, creative, and recommendation tasks \([appendix˜A](https://arxiv.org/html/2605.23147#A1)\)\.

The localized additive regime persists \([table˜2](https://arxiv.org/html/2605.23147#S5.T2); bottom row of[figure˜1](https://arxiv.org/html/2605.23147#S4.F1)\)\. On Gemma\-2\-2B\-IT, median causal KL at layer 14 is0\.2290\.229atplastp\_\{\\text\{last\}\},0\.0560\.056atg1g\_\{1\}, and0\.0560\.056atg2g\_\{2\}; the corresponding cosinescos⁡\(ΔXY,ΔX\+ΔY\)\\cos\(\\Delta\_\{XY\},\\Delta\_\{X\}\+\\Delta\_\{Y\}\)are0\.8180\.818and0\.8750\.875atplastp\_\{\\text\{last\}\}andg1g\_\{1\}, whilecos⁡\(ΔX,ΔY\)\\cos\(\\Delta\_\{X\},\\Delta\_\{Y\}\)stays substantially lower \(0\.1710\.171,0\.2340\.234\), ruling out trivial colinearity\. On Qwen\-2\.5\-1\.5B at early layer 7, the medians are0\.030/0\.018/0\.0140\.030/0\.018/0\.014, with cosines0\.8620\.862and0\.9360\.936\. Quantitatively the long\-persona numbers are larger than the short\-grid numbers – multi\-sentence personas push more into the residual – but the qualitative shape is unchanged\.

Table 2:Median causal KL on the 48\-cell long\-persona grid, with 25th–75th percentile range in brackets\. The localized additive regime remains visible under substantially broader persona and task diversity\.#### Per\-cell breakdown\.

The effect is not uniform across cells, but it is not driven by a few easy ones either\. Per\-persona median KLs on Gemma\-2\-2B\-IT atplastp\_\{\\text\{last\}\}, layer 14 range from0\.140\.14to0\.390\.39for most personas, with a few outlier cells in the tail; on Qwen\-1\.5B\-Instruct at layer 7 the persona medians are tighter, mostly0\.020\.02–0\.0350\.035\. Per\-task medians on Gemma remain low to moderate across all six task families rather than concentrating on one\.

## 6Behavioral Verification of the Additive Substitution

KL between teacher\-forced distributions is a useful but indirect measure of whether the additive substitution preserves persona\-conditioned behavior\. A persona prompt should change*what*the model talks about, not only the local probability of the next few tokens\. We therefore complement the KL evidence in[sections˜4](https://arxiv.org/html/2605.23147#S4)and[5](https://arxiv.org/html/2605.23147#S5)with a behavioral\-marker test that scores persona\-specific output content directly, in the style of factual recall verification\[[7](https://arxiv.org/html/2605.23147#bib.bib1),[8](https://arxiv.org/html/2605.23147#bib.bib5)\]\.

### 6\.1Protocol

For each of the 8 long\-form personas we define a small set of persona\-specific surface markers chosen*a priori*from the persona description itself\. For example, the engineer persona is associated with markers includingSPOF,single point of failure,scalability,reliability,redundancy; the counselor persona with markers includingfeel heard,validate,compassion; the doctor persona withdifferential,symptoms,ruling out\. The full marker sets are listed in[appendix˜B](https://arxiv.org/html/2605.23147#A2)\. We hold out the marker selection from the experiment design — markers are picked from the persona text, not from clean outputs\.

For each \(long persona, task\) cell on the 8\-persona×\\times3 review\-task subset of the long\-persona grid \(24 cells\), we generate 80 tokens greedily under four conditions:

- •Clean:*“As \[long persona\], \[task\]”*with no intervention\. This is the ceiling\.
- •Additive: same prompt, withhBB\+ΔX\+ΔYh\_\{BB\}\+\\Delta\_\{X\}\+\\Delta\_\{Y\}substituted atplastp\_\{\\text\{last\}\}, layer 14\.
- •Remove\-X: same prompt, withhXY−ΔXh\_\{XY\}\-\\Delta\_\{X\}substituted atplastp\_\{\\text\{last\}\}, layer 14 \(subtract the persona contribution\)\.
- •Bare:*“\[task\]”*with no persona prefix and no intervention\. This is the floor\.

A cell scores*any\-marker*if at least one persona marker appears in the 80\-token output \(case\-insensitive, word\-boundary\-aware\)\. A faithful additive substitution should match the clean condition on this metric while remove\-X and bare both drop substantially below it\.

### 6\.2Results

Table 3:Behavioral\-marker recovery on 8\-persona×\\times3\-task long\-persona cells, scoring 80\-token greedy continuations\. “Any” columns count cells in which at least one persona marker appears; “distinct” columns count distinct markers per generation\.The bare condition collapses to 4\.2%, confirming that the persona text is the primary source of persona\-conditioned content in this protocol\. The additive condition tracks the clean ceiling closely \(58\.3% vs\. 66\.7%; 0\.92 vs\. 1\.21 distinct markers\), so substitutinghBB\+ΔX\+ΔYh\_\{BB\}\+\\Delta\_\{X\}\+\\Delta\_\{Y\}atplastp\_\{\\text\{last\}\},L=14L=14does not damage the persona\-conditioned output\.

Remove\-X is the informative third row\. SubtractingΔX\\Delta\_\{X\}at one site barely changes marker presence \(62\.5%\)\. This is consistent with the negative result in[section˜7](https://arxiv.org/html/2605.23147#S7): during multi\-token generation, persona content arrives via attention back to the persona\-text positions, not through a single residual atplastp\_\{\\text\{last\}\}\. A one\-site subtraction cannot reach those positions\. Only removing the persona text from the prompt itself – the bare condition – collapses persona\-marker recovery\.

## 7Where the Local Additive Story Stops

A natural next question is whether the persona prompt can be replaced by the cached residual vector itself\. It cannot\. We construct, for each long\-persona cell, a host prompt with the persona text removed \(*“As a thoughtful person, \[task\]”*\) and compare 10\-token KL against the clean long\-persona continuation under three interventions atplastp\_\{\\text\{last\}\}: no substitution, oracle substitution with the cleanhXYh\_\{XY\}, and cached substitution withhBB\+ΔX\+ΔYh\_\{BB\}\+\\Delta\_\{X\}\+\\Delta\_\{Y\}\. Across 24 cells, host\-baseline median KL is3\.053\.05, cached substitution reaches2\.842\.84, oracle substitution reaches2\.642\.64\. Both close only a small fraction of the gap\.

Multi\-layer substitution does not help\. Clamping the additive prediction at successively wider layer sets gives medians2\.812\.81,2\.822\.82,3\.043\.04,3\.013\.01for layer sets\{10,12,14\}\\\{10,12,14\\\},\{10,12,14,16,18\}\\\{10,12,14,16,18\\\},\{10,12,14,16,18,20,22\}\\\{10,12,14,16,18,20,22\\\},\{6,8,10,12,14,16,18,20,22\}\\\{6,8,10,12,14,16,18,20,22\\\}\. Wider windows eventually hurt, because clamping over many layers prevents the natural forward propagation that would otherwise carry persona content through downstream computation\.

Table 4:Long\-persona host\-prompt injection does not approach clean long\-persona behavior\. Better local additive prediction does not imply prompt replacement\.The picture is two\-level\. The local additive regime characterizes the next\-step composition: atplastp\_\{\\text\{last\}\}and the first two generated tokens, in an early/mid layer band,ΔX\+ΔY\\Delta\_\{X\}\+\\Delta\_\{Y\}is causally sufficient\. Multi\-token persona\-conditioned generation depends on attention back to the persona\-text positions across the prompt, which a single residual cannot reproduce regardless of how many layers it is clamped over\.

## 8Discussion

The natural mechanistic object for online role prompting is not a single hidden state but a small region:plastp\_\{\\text\{last\}\}together with the first few generated tokens, in an early/mid layer band\. Within this region,ΔX\\Delta\_\{X\}andΔY\\Delta\_\{Y\}are reasonable summaries of the persona and task contributions, and their sum is a useful predictor of the composite residual\. The structure survives both a 48\-cell long\-persona grid and cross\-model checks on Gemma and Qwen, which gives the localization some claim to generality\.

Outside the region the story degrades, and the host\-injection experiment in[section˜7](https://arxiv.org/html/2605.23147#S7)sets the outer boundary\. Persona\-conditioned generation reaches the output through attention to persona\-text positions throughout the prompt, and a residual at one site cannot stand in for that\. The picture is two\-level: local additive composition at the prompt\-to\-answer transition, and a wider distributed prompt/KV mechanism that local substitution does not displace\.

Relative to function\-vector\[[8](https://arxiv.org/html/2605.23147#bib.bib5)\]and task\-vector\[[2](https://arxiv.org/html/2605.23147#bib.bib6)\]work,ΔX\\Delta\_\{X\}andΔY\\Delta\_\{Y\}are residual\-stream analogs of the same construction; the contribution here is compositional rather than single\-axis\. Relative to activation\-addition steering\[[9](https://arxiv.org/html/2605.23147#bib.bib2),[10](https://arxiv.org/html/2605.23147#bib.bib3)\]the aim is characterization, not control\. Relative to ROME\-style editing\[[7](https://arxiv.org/html/2605.23147#bib.bib1)\], the analysis sits at a higher level of abstraction – residual composition rather than weight\-space facts – and does not claim circuit\-level closure\.

#### On the interaction term\.

The interaction can be large in raw residual norm even where additive substitution remains causally strong\. One candidate explanation is that the residual consumed by the next layer is post\-normalization, and the next\-layer RMSNorm absorbs part of the apparent magnitude gap betweenhBB\+ΔX\+ΔYh\_\{BB\}\+\\Delta\_\{X\}\+\\Delta\_\{Y\}andhXYh\_\{XY\}before downstream computation sees it\. We do not directly intervene on the normalization path here, so we report this only as a candidate\. The empirical observation, which does not depend on it, is that raw non\-additivity in norm does not predict downstream causal fidelity under substitution\.

#### On position choice\.

The region\(plast,g1,g2\)\(p\_\{\\text\{last\}\},g\_\{1\},g\_\{2\}\)spans the transition from prompt representation to the first answer tokens\. Earlier positions inside the user turn can also show low\-KL additive substitution, but with less stable geometry\. Later generated positions drift increasingly under teacher forcing\.

## 9Limitations

1. 1\.The analysis is restricted to a small region near answer formation\. We do not study every prompt position or every generated token\.
2. 2\.We do not identify a minimal set of heads, neurons, or weights whose editing installs or removes a persona feature\.
3. 3\.The long\-persona grid is broader than the short grid but remains English\-only, cooperative, and single\-turn\.
4. 4\.Some late\-layer and particular\-cell tails are large\. We do not characterize which\(X,Y\)\(X,Y\)combinations push the interaction term beyond the regime where additive substitution remains causally faithful\.
5. 5\.The host\-injection experiment shows that single\-site substitution cannot replace persona text, but does not identify the correct distributed intervention\.
6. 6\.Cross\-model evidence is limited to two families\. The 12\-cell short\-grid localized result was checked on Gemma\-2\-2B\-IT and Qwen\-2\.5\-\{1\.5B, 3B\}; the 48\-cell long\-persona grid was run on Gemma\-2\-2B\-IT and Qwen\-2\.5\-1\.5B\-Instruct only\.

## 10Conclusion

Persona\-task composition in instruction\-tuned language models admits a localized additive description in the residual stream near answer formation, holds across Gemma\-2\-2B\-IT and Qwen\-2\.5\-\{1\.5B, 3B\}\-Instruct on a 48\-cell long\-persona grid, and survives a behavioral test of persona\-conditioned output content\. The same intervention machinery shows where that description stops: a residual at one site cannot replace the persona text the model attends to throughout multi\-token generation\. The next natural step is to characterize that distributed mechanism directly\.

#### Reproducibility\.

Code, cached experiment outputs, and the paper source are available at[https://github\.com/xuy/localized\-additive\-composition](https://github.com/xuy/localized-additive-composition)\. All experiments use HuggingFacetransformerson a single Apple Silicon device with thempsbackend\. Gemma\-2\-2B\-IT is loaded infloat32; Qwen\-2\.5\-1\.5B\-Instruct and Qwen\-2\.5\-3B\-Instruct are loaded inbfloat16\(float16produces NaN residuals at the layers relevant here\)\. Attention uses theeagerimplementation\. All generation is greedy \(do\_sample=False,num\_beams=1\), so reported KLs and behavioral scores are deterministic functions of the model, prompt, and intervention\. KL is computed by greedy 10\-token continuation from the cleanXYXYprompt, followed by teacher\-forced log\-probability comparison across the same 10\-token reference window under each intervention\.

#### Code and artifacts\.

The main scripts supporting the paper are:

- •scripts/v15\_localized\_positions\.py– localized\-position causal sweep \([section˜4](https://arxiv.org/html/2605.23147#S4)\)
- •scripts/v16\_diverse\_grid\.py– broadened 8×\\times6 long\-persona grid \([section˜5](https://arxiv.org/html/2605.23147#S5)\)
- •scripts/v17\_behavioral\_markers\.py– behavioral\-marker recovery test \([section˜6](https://arxiv.org/html/2605.23147#S6)\)
- •scripts/v14b\_v2\_inject\.py– host\-prompt single\-site substitution \([section˜7](https://arxiv.org/html/2605.23147#S7)\)
- •scripts/v14e\_multilayer\_inject\.py– multi\-layer substitution \([section˜7](https://arxiv.org/html/2605.23147#S7)\)

## Appendix APrompt Grids

### A\.1Original short grid

#### Personas\.

The short\-grid persona set is:

- •Warren Buffett
- •Karl Marx
- •Yoda
- •Maya Angelou

#### Tasks\.

The short\-grid task set is:

- •“Comment on whether universal basic income is a good policy\.”
- •“Write a haiku about Monday mornings\.”
- •“Recommend a book worth reading and explain why\.”

#### Baselines\.

The baseline persona is “a thoughtful person” and the baseline task is “Give advice to someone facing a difficult decision\.”

### A\.2Broadened long\-persona grid

#### Personas\.

The long\-persona set is:

- •a senior software engineer with 10 years of experience who pays close attention to architecture, reliability, and avoiding single points of failure
- •an empathetic counselor with deep training in active listening, cognitive behavioral therapy, and trauma\-informed care, who helps clients feel heard without imposing solutions
- •a pragmatic startup founder who has bootstrapped three companies, makes capital\-efficient decisions, iterates fast based on user feedback, and avoids vanity metrics
- •a middle school science teacher who has taught for 15 years, explains concepts with relatable analogies, gently checks for understanding, and meets students at their level
- •an investigative journalist who has covered city government for two decades, asks pointed questions, follows the money, and verifies every claim against primary sources
- •a primary\-care physician who has practiced for 25 years, listens carefully to symptoms, considers differential diagnoses without alarming the patient, and explains options clearly
- •a corporate litigator who has tried cases at the appellate level for 20 years, anticipates opposing arguments, builds case theory from the record, and communicates dense law in plain English
- •a head chef trained in classical French technique who has run three Michelin\-starred kitchens, builds menus around seasonal ingredients, and teaches young cooks by demonstration

#### Tasks\.

The long\-persona task set is:

- •architecture review: “Review this design: a microservice architecture where eight services share a single PostgreSQL database for both transactional state and event log\.”
- •startup\-plan review: “Review this plan: a three\-person team building a B2B SaaS product, planning to launch in three months, with no usage analytics in v1\.”
- •scheduling\-proposal review: “Review this proposal: an internal tool that automates calendar scheduling using an LLM, sending tentative meetings to all parties before confirmation\.”
- •policy commentary: “Comment on whether universal basic income is a good policy\.”
- •constrained creative: “Write a haiku about Monday mornings\.”
- •recommendation: “Recommend a book worth reading and explain why\.”

## Appendix BBehavioral Markers

The persona marker sets used in[section˜6](https://arxiv.org/html/2605.23147#S6)\. Markers were selected from each persona’s description before running the experiment\.

- •engineer:SPOF,single point of failure,scalability,scalable,reliability,fault tolerance,fault\-tolerant,redundancy,resilience,throughput,latency,consistency,availability\.
- •counselor:feel heard,validate,validated,acknowledge,acknowledged,your experience,your feelings,compassion,compassionate,without judgment,trauma\-informed,active listening,feelings,emotion\.
- •founder:iterate,iterating,iteration,MVP,minimum viable,user feedback,customer feedback,capital\-efficient,lean,runway,validation,ship,shipping,product\-market fit,vanity metric,traction,burn,bootstrapped\.
- •teacher:imagine,think of,like a,analogy,analogies,for example,step by step,step\-by\-step,students,understand,let’s say,picture this,as if\.
- •journalist:sources,source,primary source,primary sources,follow the money,accountability,transparency,investigate,verify,verified,on the record,off the record,evidence,public interest,pointed question,track record\.
- •doctor:symptom,symptoms,diagnosis,diagnose,differential,patient,treatment,ruling out,rule out,clinical,examination,condition,medication,evaluate,underlying,comorbid\.
- •lawyer:liability,liabilities,jurisdiction,statute,precedent,evidence,evidentiary,opposing,counterparty,due diligence,parties,indemnif\*,contractual,compliance,compliant,jurisprudence,case theory,on the record\.
- •chef:seasonal,season,ingredient,ingredients,flavor,flavour,palate,fresh,simmer,sauté,saute,balance,garnish,mise en place,technique,classical,French\.

## Appendix CArtifact Map

The key result files used in the paper are:

- •results/v15\_gemma2b\_localized\_positions\.json
- •results/v15\_qwen15\_localized\_positions\.json
- •results/v15\_qwen3b\_localized\_positions\.json
- •results/v16\_gemma2b\_diverse\_grid\.json
- •results/v16\_qwen15\_diverse\_grid\.json
- •results/v17\_behavioral\_markers\.json
- •results/v14b\_v2\.json
- •results/v14e\_multilayer\.json

## References

- \[1\]Anthropic Interpretability Team\(2024\)Scaling monosemanticity: extracting interpretable features from Claude 3 Sonnet\.Transformer Circuits Thread\.Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px4.p1.1)\.
- \[2\]R\. Hendel, M\. Geva, and A\. Globerson\(2023\)In\-context learning creates task vectors\.InFindings of EMNLP,Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px1.p1.3),[§8](https://arxiv.org/html/2605.23147#S8.p3.2)\.
- \[3\]E\. Hernandez, A\. S\. Sharma, T\. Haklay, K\. Meng, M\. Wattenberg, J\. Andreas, Y\. Belinkov, and D\. Bau\(2024\)Linearity of relation decoding in transformer language models\.InInternational Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px4.p1.1)\.
- \[4\]G\. Ilharco, M\. T\. Ribeiro, M\. Wortsman, L\. Schmidt, H\. Hajishirzi, and A\. Farhadi\(2023\)Editing models with task arithmetic\.InInternational Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px2.p1.1)\.
- \[5\]B\. Lester, R\. Al\-Rfou, and N\. Constant\(2021\)The power of scale for parameter\-efficient prompt tuning\.InProceedings of EMNLP,Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px3.p1.1)\.
- \[6\]X\. L\. Li and P\. Liang\(2021\)Prefix\-tuning: optimizing continuous prompts for generation\.InProceedings of ACL,Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px3.p1.1)\.
- \[7\]K\. Meng, D\. Bau, A\. Andonian, and Y\. Belinkov\(2022\)Locating and editing factual associations in GPT\.Advances in Neural Information Processing Systems\.Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px4.p1.1),[§6](https://arxiv.org/html/2605.23147#S6.p1.1),[§8](https://arxiv.org/html/2605.23147#S8.p3.2)\.
- \[8\]E\. Todd, M\. L\. Li, A\. S\. Sharma, A\. Mueller, B\. C\. Wallace, and D\. Bau\(2024\)Function vectors in large language models\.InInternational Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px1.p1.3),[§6](https://arxiv.org/html/2605.23147#S6.p1.1),[§8](https://arxiv.org/html/2605.23147#S8.p3.2)\.
- \[9\]A\. M\. Turner, L\. Thiergart, D\. Udell, G\. Leech, U\. Mini, and M\. MacDiarmid\(2023\)Activation addition: steering language models without optimization\.arXiv preprint arXiv:2308\.10248\.Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px2.p1.1),[§8](https://arxiv.org/html/2605.23147#S8.p3.2)\.
- \[10\]A\. Zou, L\. Phan, S\. Chen, J\. Campbell,et al\.\(2023\)Representation engineering: a top\-down approach to AI transparency\.arXiv preprint arXiv:2310\.01405\.Cited by:[§2](https://arxiv.org/html/2605.23147#S2.SS0.SSS0.Px2.p1.1),[§8](https://arxiv.org/html/2605.23147#S8.p3.2)\.
As X, Do Y: How Persona and Task Combine in Instruction-Tuned LLMs

Similar Articles

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

PersonaVLM: Long-Term Personalized Multimodal LLMs

Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception

Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs

Submit Feedback

Similar Articles

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs
PersonaVLM: Long-Term Personalized Multimodal LLMs
Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception
Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning
When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMs