MidSteer: Optimal Affine Framework for Steering Generative Models

arXiv cs.LG 05/08/26, 04:00 AM Papers
Summary
Introduces MidSteer, a theoretical framework for concept steering in generative models, bridging the gap between empirical success and theoretical understanding by providing optimal affine transformations for steering, erasing, and switching concepts in LLMs and vision diffusion models.
arXiv:2605.05220v1 Announce Type: new Abstract: Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post-deployment alignment and safety settings. However, despite its empirical success, it currently lacks a comprehensive theoretical framework. In this paper, we bridge this gap by formalizing the theory of concept steering. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE (a closed-form method for affine erasure). Next, we formulate a principled theoretical framework for concept switching, LEACE-Switch, and characterize the assumptions under which it provides an optimal affine solution. Building on this analysis, we then introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal-disturbance transformations. We demonstrate that MidSteer performs favorably across a range of tasks, modalities, and architectures, including vision diffusion models and large language models.
Original Article Export to Word Export to PDF
View Cached Full Text
Cached at: 05/08/26, 06:31 AM
# MIDSTEER: Optimal Affine Framework for Steering Generative Models
Source: [https://arxiv.org/html/2605.05220](https://arxiv.org/html/2605.05220)
Andrew StepanovZiquan LiuMartin BenningGregory SlabaughJiankang DengIsmail Elezi

###### Abstract

Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post\-deployment alignment and safety settings\. However, despite its empirical success, it currently lacks a comprehensive theoretical framework\. In this paper, we bridge this gap by formalizing the theory of concept steering\. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE \(a closed\-form method for affine erasure\)\. Next, we formulate a principled theoretical framework for concept switching, LEACE\-Switch, and characterize the assumptions under which it provides an optimal affine solution\. Building on this analysis, we then introduce MidSteer \(MinimalDisturbance conceptSteering\), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal\-disturbance transformations\. We demonstrate that MidSteer performs favorably across a range of tasks, modalities, and architectures, including vision diffusion models and large language models\.

## 1Introduction

Generative models such as Large Language Models \(LLMs\) and vision diffusion models have achieved remarkable progress in recent years\(Yanget al\.,[2024b](https://arxiv.org/html/2605.05220#bib.bib107)\)\(Naveedet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib106)\)\. However, controlling model outputs to enforce desirable behaviors or suppress harmful ones remains challenging\(Bartoszczeet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib18)\)\. Yet, this capability is necessary for improving model safety, reliability, alignment, and usefulness in downstream applications\.

Concept steering of intermediate representations is an increasingly popular technique that has already proven to be simple yet powerful for controlling behavior in LLMs\(Panicksseryet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib6)\)\. Recently it was also shown to be applicable to vision diffusion models\(Gaintsevaet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib1)\)\. The underlying idea is to change the intermediate representations of a generative model during generation by adding or subtracting a ”steering vector” that encodes a target concept\. This approach has proven effective for tasks such as erasing unwanted behaviors \(toxicity, nudity\) or amplifying desirable features \(helpfulness, truthfulness\)\(Panicksseryet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib6)\)\(Singhet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib22)\)\(Gaintsevaet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib1)\)\. However, despite the simplicity of the approach, its theoretical foundations remain underdeveloped with most of the work around it being highly empirical\(Zouet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib19)\)\(Wehneret al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib110)\)\. Existing methods largely rely on heuristic vector manipulations, which can introduce unintended side effects and lack a solid theoretical basis and guarantees\(Raedleret al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib112)\)\(Anthropic,[2024](https://arxiv.org/html/2605.05220#bib.bib4)\)\. Moreover, naive steering often perturbs unrelated features, undermining the minimal disturbance principle that is critical to maintaining model quality and coherence\.

Recently, strong theoretical foundations have been developed for concept erasure\.\(Ravfogelet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib24)\)introduced the notion of log\-linear guardedness\. Based on it,\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)developed LEACE, an affine concept erasure framework to remove undesired information from model representations for downstream tasks\. However, these methods do not naturally extend to other forms of concept manipulation, such as switching, where the goal is to replace one concept with another rather than just erasing it\.

In this work, we address these gaps by developing a unified theoretical framework for affine steering of generative models\. We begin by proving a formal equivalence between the standard steering methodology and LEACE, demonstrating that the widely used heuristic of steering for concept erasure is a special case of optimal affine concept erasure\. Building on this foundation, we extend the framework from erasure to concept switching\. We first consider the setting of bidirectional switching, in which a binary concept partitions the dataset and the goal is to invert the linear dependence of the representation on the concept label\. Under these assumptions, we derive LEACE\-Switch, an optimal affine transformation that performs a complete and symmetric concept swap while minimally disturbing the representation\.

We examine the scope of this formulation and show that its assumptions, dataset partitioning and global label inversion define a practically relevant, though restricted, regime\. We address more general settings in which the concepts involved do not jointly span the entire dataset, or where asymmetric, one\-directional transformations are desired\. We introduce MidSteer \(Minimal Disturbance Concept Steering\), a generalized framework for affine concept manipulation that enables precise switching while minimizing interference with unrelated properties of the representation\.

Through experiments with LLMs and vision diffusion models, we demonstrate that LEACE\-Switch and MidSteer achieve more reliable concept switching than vanilla steering, allowing controllable generation with minimal side effects\. Our results highlight the value of grounding steering methods in theory and provide practical tools for aligning generative models with desired behaviors\.

In summary, ourcontributionsare as follows:

- •We establish aformal theoretical connectionbetween standard activation steering and affine concept erasure, showing that commonly used steering heuristics for concept erasure are special cases of LEACE\.
- •Weextend the affine erasure framework to concept switchingand introduceLEACE\-Switch, an optimal affine formulation for concept swapping under the assumption that a binary concept partitions the dataset\.
- •We then relax the dataset partitioning and symmetry requirements for the task of concept switching, and introduceMidSteer\(Minimal Disturbance Concept Steering\), ageneralized affine framework for concept manipulationthat enables precise and directed concept switching with provably minimal interference to unrelated representation components\.
- •Weempirically validateLEACE\-Switch and MidSteer across modalities and architectures, including LLMs and vision diffusion models, demonstrating improved controllability and reduced side effects compared to existing steering and erasure methods\.

## 2Related Work

Activation steering and representation manipulation\. Activation steering has emerged as a lightweight approach for controlling generative models by modifying intermediate representations, particularly in LLMs\(Turneret al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib10)\)\(Bartoszczeet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib18)\)\(Rimskyet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib11)\)and more recently in diffusion models\(Tumanyanet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib71)\)\(Kwonet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib68)\)\(Gaintsevaet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib1)\)\. Most existing methods rely on heuristic vector addition or subtraction, often derived from mean activation differences, and provide limited guarantees on optimality or side effects\. Our work formalizes steering as an affine transformation problem and studies when such interventions are provably minimal and well\-posed\.

Affine concept erasure and guardednessA related line of work focuses on removing concept information from representations\. INLP\(Ravfogelet al\.,[2020](https://arxiv.org/html/2605.05220#bib.bib20)\)and RLACE\(Ravfogelet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib24)\)iteratively project out linear subspaces predictive of a protected attribute\. LEACE\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)provides a closed\-form affine solution for optimal linear concept erasure under the guardedness framework, minimizing representational disturbance while enforcing zero covariance with the concept label\. Our work builds directly on this theory, showing that standard erasure\-mode steering is a special case of LEACE, and extending the framework beyond erasure\. Next, SPLINCE\(Holstegeet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib21)\)study oblique projections that erase protected attributes while preserving task\-relevant subspaces\. While these methods address constrained erasure, they do not consider concept switching, which requires translating concept dependence rather than erasing it\. Nevertheless, they motivate the importance of minimal\-disturbance objectives with structural constraints, which our work addresses in the switching setting\.

Distributional alignment and representation surgery\. Representation Surgery\(Singhet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib22)\)derives affine transformations that match class\-conditional means, and optionally covariances, between source and target distributions\. This approach performs distributional alignment under Gaussian assumptions and is well\-suited to tasks where class\-conditional statistics fully characterize the desired transformation\. In contrast, MidSteer operates on cross\-covariances between representations and concept indicators, preserving the global linear structure of the representation space and explicitly minimizing changes outside the concept\-mediating subspace\. As a result, MidSteer targets concept switching rather than full distribution matching, and the two approaches optimize distinct objectives\.

In summary, while prior methods address erasure or distributional alignment, our work is the first to formalize concept switching as a distinct affine problem and to derive closed\-form solutions with explicit minimal\-disturbance guarantees across modalities\.

## 3Preliminaries

### 3\.1Steering internal representations of models

We formalizeactivation steeringas the manipulation of internal model representations to control the presence of a specific conceptccin the model’s output\. This is achieved by adding a scaled steering vectorscs\_\{c\}to the intermediate hidden activityhhduring inference\. Thesteering vectorscs\_\{c\}is constructed from the concept\-conditional means of the hidden activity\. Lethhbe a random vector inℝd\\mathbb\{R\}^\{d\}representing the activity at a particular layer, and letC∈\{0,1\}C\\in\\\{0,1\\\}represent the presence or absence of conceptcc\. The steering vectorsc∈ℝds\_\{c\}\\in\\mathbb\{R\}^\{d\}is defined as the difference of these means:

sc=𝔼\[h\|C=1\]−𝔼\[h\|C=0\],s\_\{c\}=\\mathbb\{E\}\[h\|C=1\]\-\\mathbb\{E\}\[h\|C=0\],\(1\)scs\_\{c\}can be optionally post\-processed \(to have unit norm\)\.

The general steering interventionffcontrols the concept’s expressiveness via a scalarα∈ℝ\\alpha\\in\\mathbb\{R\}representing the steering strength and direction\. Omitting the subscriptcc, the operation is:

f\(h,s\)=h\+αsf\(h,s\)=h\+\\alpha s\(2\)We highlight two essential special cases, determined by the choice of the intervention functionff\.

Concept Erasure\.It aims to erase all information aligned with the conceptccfrom the activation vectorhh\. This is achieved by projectinghhonto the subspace orthogonal to the steering directionss\. The projection⟨h,s⟩s\\langle h,s\\rangle sestimates the conceptual component, which is then removed:

fdelete\(h,s\)=h−⟨h,s⟩sf\_\{\\text\{delete\}\}\(h,s\)=h\-\\langle h,s\\rangle s\(3\)Concept switch\.Multiplying projection by22results in aHouseholder reflectionof the vectorhhacross the hyperplane orthogonal toss:

fswitch\(h,s\)=h−2⟨h,s⟩sf\_\{\\text\{switch\}\}\(h,s\)=h\-2\\langle h,s\\rangle s\(4\)This transformation effectively substitutes the component ofhhaligned withccwith its opposite, thereby substituting the representation of conceptccwith the representation of its absence\.

If steering is applied to specific layers \(e\.g\., self\-attention outputs in LLMs or cross\-attention layers in vision models\), both Equation[3](https://arxiv.org/html/2605.05220#S3.E3)and Equation[4](https://arxiv.org/html/2605.05220#S3.E4)can be incorporated directly into the model’s weight matrices\. This allows for achievingzero inference overhead, a critical advantage for deployment in large\-scale applications\.

### 3\.2Affine guardedness framework

For the task of concept erasure from model representations,\(Ravfogelet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib24)\)introduced the notion of log\-linear guardedness\.\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)generalized it to:

###### Definition 3\.1\(Guardedness\)\.

Consider akk\-class classification task over jointly defined random vectorsXX\(the input data\) andZZ\(the one\-hot labels\), withXXof finite first moment and taking values inℝd\\mathbb\{R\}^\{d\}, andZZtaking values in𝒵=\{𝐳∈\{0,1\}k\|‖𝐳‖1=1\}\\mathcal\{Z\}=\\\{\\mathbf\{z\}\\in\\mathbb\{\\\{\}0,1\\\}^\{k\}\\ \\big\|\\ \\\|\\mathbf\{z\}\\\|\_\{1\}=1\\\}with each111Integerj≤kj\\leq kis used to refer to the element of𝒵\\mathcal\{Z\}which is11at thejthj^\{\\text\{th\}\}index and0elsewhere\.ℙ\(Z=j\)\>0\\mathbb\{P\}\(Z=j\)\>0\. Letη\(⋅;𝜽\):ℝd→ℝk\\eta\(\\cdot;\{\\bm\{\\theta\}\}\):\\mathbb\{R\}^\{d\}\\to\\mathbb\{R\}^\{k\}be a predictor chosen from a function class𝒱=\{η\(⋅;𝜽\)\|𝜽∈Θ\}\\mathcal\{V\}=\\\{\\eta\(\\cdot;\{\\bm\{\\theta\}\}\)\\ \|\\ \{\\bm\{\\theta\}\}\\in\\Theta\\\}\(presumed to contain all constant functions\) so as to minimize the expectation𝔼\[ℒ\(η\(X\),Z\)\]\\mathbb\{E\}\\big\[\\color\[rgb\]\{0,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,0\}\\pgfsys@color@gray@stroke\{0\}\\pgfsys@color@gray@fill\{0\}\\mathcal\{L\}\(\\eta\(X\),Z\)\\big\]of someℒ:ℝk×𝒵→\[0,∞\)\\color\[rgb\]\{0,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,0\}\\pgfsys@color@gray@stroke\{0\}\\pgfsys@color@gray@fill\{0\}\\mathcal\{L\}:\\mathbb\{R\}^\{k\}\\times\\mathcal\{Z\}\\to\[0,\\infty\)in a class𝔏\\mathfrak\{L\}of loss functions\. Letχ\\chibe the set of all random vectors of finite first moment taking values inℝd\\mathbb\{R\}^\{d\}, jointly defined withZZ\.

We sayXX\(𝒱,𝔏\)−\(\\mathcal\{V\},\\mathfrak\{L\}\)\-guardsZZif, for all lossesℒ∈𝔏\\color\[rgb\]\{0,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,0\}\\pgfsys@color@gray@stroke\{0\}\\pgfsys@color@gray@fill\{0\}\\mathcal\{L\}\\in\\mathfrak\{L\}, it maximizes the minimum expected loss:

X∈argmaxX′∈χinfθ∈Θ𝔼\[ℒ\(η\(X′;θ\),Z\)\]\.X\\in\\mathop\{\\mathrm\{argmax\\\>\}\}\_\{\\begin\{subarray\}\{c\}X^\{\\prime\}\\in\\chi\\end\{subarray\}\}\\inf\_\{\\theta\\in\\Theta\}\\\>\\mathbb\{E\}\\Big\[\\color\[rgb\]\{0,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0,0,0\}\\pgfsys@color@gray@stroke\{0\}\\pgfsys@color@gray@fill\{0\}\\mathcal\{L\}\(\\eta\(X^\{\\prime\};\\theta\),Z\)\\Big\]\.In other words, its conditional distributionℙ\(X∣Z=⋅\)\\mathbb\{P\}\(X\\mid Z=\\cdot\)is among the worst possible distributions for predictingZZfromXXusing a predictor of the formη\(⋅;θ\)∈𝒱\\eta\(\\cdot;\\theta\)\\in\\mathcal\{V\}and a loss function in𝔏\\mathfrak\{L\}\.

Building on Definition[3\.1](https://arxiv.org/html/2605.05220#S3.Thmtheorem1), note that guardedness characterizes inputs whose conditional distributions make the targetZZmaximally unpredictable under a given model class and loss\. In particular, for linear–affine predictors and squared loss,\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)show that this worst\-case unpredictability is achieved precisely when the representation is uncorrelated with the concept, i\.e\., whenCov\(h,Z\)=0\\mathrm\{Cov\}\(h,Z\)=0\. Thus, enforcing guardedness in a linear representation amounts to removing all linear statistical dependence betweenXX\(or an internal representationhhderived from it\) andZZ\.

###### Theorem 3\.2\(Linear Guardedness\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)\)\.

The following statements are equivalent:

- •The data X linearly guards the labels Z \(Def\.[3\.1](https://arxiv.org/html/2605.05220#S3.Thmtheorem1)\)
- •Every component of X has zero covariance with every component of Z:Cov\(X,Z\)=0\\mathrm\{Cov\}\(X,Z\)=0\.

At the same time, when erasing concepts from model representations, we typically wish to alter the representations as little as possible, so that downstream performance on unrelated tasks is preserved\. Thus, it is natural to seek the affine transformation ofXXwhose output is closest toXXbased on some distance metric, while achieving this covariance\-removal property\.

Guided by these two principles: \(i\) guardedness corresponds to zero covariance, and \(ii\) we prefer minimal deviation from the original representation,\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)prove the following result:

###### Theorem 3\.3\(LEACE,\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)\)\.

LetXX,ZZbe random vectors taking values inℝd\\mathbb\{R\}^\{d\}andℝk\\mathbb\{R\}^\{k\}respectively, each of finite second moments\. DefineΣXX=Cov\(X,X\)∈ℝd×d\\Sigma\_\{XX\}=\\mathrm\{Cov\}\(X,X\)\\in\\mathbb\{R\}^\{d\\times d\}andΣXZ=Cov\(X,Z\)∈ℝd×k\\Sigma\_\{XZ\}=\\mathrm\{Cov\}\(X,Z\)\\in\\mathbb\{R\}^\{d\\times k\}\. AssumeIm\(ΣXZ\)⊆Im\(ΣXX\)\\mathrm\{Im\}\(\\Sigma\_\{XZ\}\)\\subseteq\\mathrm\{Im\}\(\\Sigma\_\{XX\}\)\. The following optimization problem:

minA∈ℝd×db∈ℝd⁡𝔼\[∥AX\+b−X∥22\]s\.t\.Cov\(AX\+b,Z\)=0\\min\_\{\\begin\{subarray\}\{c\}A\\in\\mathbb\{R\}^\{d\\times d\}\\\\ b\\in\\mathbb\{R\}^\{d\}\\end\{subarray\}\}\\mathbb\{E\}\\Big\[\\lVert\{AX\+b\-X\}\\rVert\_\{2\}^\{2\}\\Big\]\\quad\\text\{ s\.t\. \}\\quad\\mathrm\{Cov\}\(AX\+b,Z\)=0\(5\)has the following solution \(almost surely\):

A^\\displaystyle\\widehat\{A\}=I−W\+\(WΣXZ\)\(WΣXZ\)\+W,\\displaystyle=I\-W^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}W\\,,\(6\)b^\\displaystyle\\widehat\{b\}=𝔼\[X\]−A^⋅𝔼\[X\],\\displaystyle=\\mathbb\{E\}\[X\]\-\\widehat\{A\}\\cdot\\mathbb\{E\}\[X\]\\,,\(7\)whereW=\(ΣXX1/2\)\+W=\(\\Sigma\_\{XX\}^\{1/2\}\)^\{\+\}is whitening transformation\.

Here and later we useA\+A^\{\+\}to denote the pseudo\-inverse of \(any\) matrixAAandA1/2A^\{1/2\}to denote the square root of a positive semi\-definite symmetric matrixAA, i\.e\. for the singular value decompositionA=VSV⊤A=VSV^\{\\top\}with orthonormal matrixVVand diagonal matrixSSwith non\-negative singular values on the diagonal, the matrixA1/2A^\{1/2\}is defined asA1/2:=VS1/2V⊤A^\{1/2\}:=VS^\{1/2\}V^\{\\top\}, where the square root of the diagonal entries ofSSis computed\.

## 4Theoretical Results

We theoretically analyze connections between steering approaches \(Sec\.[3\.1](https://arxiv.org/html/2605.05220#S3.SS1)\) and LEACE \(Sec\.[3\.2](https://arxiv.org/html/2605.05220#S3.SS2)\), and derive a novel framework for affine concept steering called MidSteer, that unifies and generalizes all these approaches\.

This section is organized as follows\. We first consider the task of concept erasure and establish a formal connection between the standard steering setup for erasure and LEACE, showing that Eq\.[3](https://arxiv.org/html/2605.05220#S3.E3)arises as a special case of the LEACE framework\. We then extend LEACE to the task of optimal affine concept switching and deriveLEACE\-Switch, demonstrating that Eq\.[4](https://arxiv.org/html/2605.05220#S3.E4)is a special case of this bidirectional switching formulation\.

Next, we characterize the assumptions under which LEACE\-Switch provides an optimal solution and introduceMidSteer, a more general affine framework for concept manipulation that enables directed, minimal\-disturbance transformations without requiring dataset\-wide label inversion\. Finally, through experiments on large language models and vision diffusion models, we show that MidSteer consistently outperforms vanilla steering and LEACE\-based switching, achieving precise concept control while preserving unrelated features of the generated outputs\.

### 4\.1Connection between steering in erasure mode and LEACE

![Refer to caption](https://arxiv.org/html/2605.05220v1/img/erase.png)\(a\)Illustrative example of affine concept erasure\. In this case the affine transformation is satisfyingcov\(AX\+b,Z\)=0\\text\{cov\}\(AX\+b,Z\)=0\. This figure is inspired by\(Belroseet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib2)\)\.
![Refer to caption](https://arxiv.org/html/2605.05220v1/img/flip.png)\(b\)Illustrative example of affine concept switching\. In this case the affine transformation is satisfying−cov\(AX\+b,Z\)=cov\(X,Z\)\-\\text\{cov\}\(AX\+b,Z\)=\\text\{cov\}\(X,Z\)\.

Figure 1:Illustrative example of affine concept erasure and affine concept flipping frameworks\.In this section, we show that steering in erasure mode \(Eq\.[3](https://arxiv.org/html/2605.05220#S3.E3)\) is a special case of LEACE\. We formulate and prove a following corollary to the Theorem[3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3):

###### Corollary 4\.1\.

LetXXbe a standardized random vector inℝd\\mathbb\{R\}^\{d\}, i\.e\., it has zero meanE\[x\]=0E\[x\]=0and unit covariance matrixΣXX=I\\Sigma\_\{XX\}=I\. LetC∈\{0,1\}C\\in\\\{0,1\\\}be a concept indicator variable\. Letssbe defined as in Eq\.[1](https://arxiv.org/html/2605.05220#S3.E1)\. Letfdeletef\_\{delete\}be defined as in Eq\.[3](https://arxiv.org/html/2605.05220#S3.E3)\. Thenfdeletef\_\{delete\}as a function ofhhminimizes

minf∈Aff\(ℝd↦ℝd\)⁡E\[∥f\(X\)−X∥2\]s\.t\.Cov\(f\(X\),C\)=0\\min\_\{f\\in\\mathrm\{Aff\}\(\\mathbb\{R\}^\{d\}\\mapsto\\mathbb\{R\}^\{d\}\)\}\\mathrm\{E\}\[\\lVert\{f\(X\)\-X\}\\rVert^\{2\}\]\\ \\\\ \\text\{ s\.t\. \}\\ \\mathrm\{Cov\}\(f\(X\),C\)=0\(8\)

This theorem states that steering in erasure mode can be seen as LEACE under the assumptions that the whitening matrix is the identity and that the mean of all vectors is zero\. The proof is found in the Appendix[D\.1](https://arxiv.org/html/2605.05220#A4.SS1)\.

### 4\.2Concept switching

We generalize beyond a single binary concept, and consider the task of transforming the representation of one conceptc1c\_\{1\}into anotherc2c\_\{2\}\. We call this taskconcept switching\.

More formally, letc1c\_\{1\}andc2c\_\{2\}denote two distinct concepts, each associated with a subset of the data distribution \(not necessarily jointly covering it\), and their corresponding binary indicatorsC1,C2∈\{0,1\}C\_\{1\},C\_\{2\}\\in\\\{0,1\\\}\. The goal of a*concept switch*operation is to construct a transformationffsuch that samples exhibiting conceptc1c\_\{1\}after transformation exhibit conceptc2c\_\{2\}, while preserving all other factors of variation as much as possible\. Formally, this requires modifying the dependence of the representation onC1C\_\{1\}and introducing a desired dependence onC2C\_\{2\}, without enforcing any relationship betweenC1C\_\{1\}andC2C\_\{2\}outside their observed support\.

We now adapt LEACE \(Theorem[3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3)\) framework to the task of concept switching, and formulate a theoretical framework for the optimal affine concept switching, LEACE\-Switch\. Additionally, we show that steering in switching mode Eq\.[4](https://arxiv.org/html/2605.05220#S3.E4)can be seen as a special case of this framework\. We then characterize the scope of LEACE\-Switch by making explicit the assumptions under which it provides an optimal solution\. Finally, we introduce MidSteer, a generalized affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal\-disturbance concept switching in more general settings\.

#### 4\.2\.1LEACE for concept switching

Recall thatZ∈\{0,1\}kZ\\in\\\{0,1\\\}^\{k\}denotes a binary concept vector\. We now consider the task of*bidirectional concept switching*, in which the concept encoded byZZis assumed to partition the dataset \(i\.e\.,P\(Z=1\)\+P\(Z=0\)=1P\(Z=1\)\+P\(Z=0\)=1\), so that each sample belongs either to the concept or to its complement\. Under this assumption, concept switching can be interpreted as a global inversion of the linear dependence between the representation and the concept label\.

Within the covariance\-based guardedness framework of LEACE, such an inversion admits a natural affine formulation\. Rather than removing linear information about the concept as in concept erasure, we seek to preserve the magnitude of the linear dependence while reversing its sign\. This leads to the following constraint:

Cov\(f\(X\),𝟏k−Z\)=Cov\(X,Z\),\\mathrm\{Cov\}\(f\(X\),\\mathbf\{1\}^\{k\}\-Z\)=\\mathrm\{Cov\}\(X,Z\)\\,,\(9\)where𝟏k\\mathbf\{1\}^\{k\}denotes akk\-dimensional vector of ones\. By linearity of covariance, this condition is equivalent to

Cov\(f\(X\),Z\)=−Cov\(X,Z\)\.\\mathrm\{Cov\}\(f\(X\),Z\)=\-\\,\\mathrm\{Cov\}\(X,Z\)\\,\.\(10\)See Fig\.[1\(b\)](https://arxiv.org/html/2605.05220#S4.F1.sf2)for a geometric illustration of this\.

Equation[9](https://arxiv.org/html/2605.05220#S4.E9)constitutes the affine analogue of a*perfect concept flip*within the guardedness framework\. In LEACE \(Theorem[3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3)\), concept erasure is achieved by enforcingCov\(f\(X\),Z\)=0\\mathrm\{Cov\}\(f\(X\),Z\)=0, ensuring that the transformed representation carries no linear signal about the concept\. In contrast, the constraint above enforces a complete inversion of this signal: samples that previously correlated positively with the concept now correlate equally strongly with its complement, and vice versa\. This corresponds to a symmetric, dataset\-wide concept swap and mirrors the role of reflection\-based switching in vanilla steering\.

Analogous to Theorem[3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3), we now characterize the optimal affine transformation that satisfies the constraint in Eq\.[9](https://arxiv.org/html/2605.05220#S4.E9)while minimally disturbing the original representation:

###### Theorem 4\.2\(LEACE\-Switch, Optimal Concept Switching\)\.

LetX,Z,ΣXX,ΣXZX,Z,\\Sigma\_\{XX\},\\Sigma\_\{XZ\}be defined as in Theorem[3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3)\. AssumeIm\(ΣXZ\)⊆Im\(ΣXX\)\\mathrm\{Im\}\(\\Sigma\_\{XZ\}\)\\subseteq\\mathrm\{Im\}\(\\Sigma\_\{XX\}\)\. Then the optimization problem

minA∈ℝd×d,b∈ℝd⁡E\[∥AX\+b−X∥22\]\\displaystyle\\min\_\{\\begin\{subarray\}\{c\}A\\in\\mathbb\{R\}^\{d\\times d\},\\\\ b\\in\\mathbb\{R\}^\{d\}\\end\{subarray\}\}\\mathrm\{E\}\\Big\[\\lVert\{AX\+b\-X\}\\rVert\_\{2\}^\{2\}\\Big\]\(11\)s\.t\.Cov\(AX\+b,Z\)=−Cov\(X,Z\)\\displaystyle\\text\{ s\.t\. \}\\mathrm\{Cov\}\(AX\+b,Z\)=\-\\mathrm\{Cov\}\(X,Z\)\(12\)has the following solution \(almost surely\):

A^\\displaystyle\\widehat\{A\}=I−2W\+\(WΣXZ\)\(WΣXZ\)\+W,\\displaystyle=I\-2W^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}W\\,,\(13\)b^\\displaystyle\\widehat\{b\}=𝔼\[X\]−A^𝔼\[X\],\\displaystyle=\\mathbb\{E\}\[X\]\-\\widehat\{A\}\\mathbb\{E\}\[X\]\\,,\(14\)

The proof is found in the Appendix[D\.2](https://arxiv.org/html/2605.05220#A4.SS2)\. Also note, since the transform is affine, it can be incorporated into the weight matrix of the model, thus achieving zero inference overhead\. Similar to corollary[4\.1](https://arxiv.org/html/2605.05220#S4.Thmtheorem1), we now show that concept switching as defined in Eq\.[4](https://arxiv.org/html/2605.05220#S3.E4)is a special case of Theorem[4\.2](https://arxiv.org/html/2605.05220#S4.Thmtheorem2)\.

###### Corollary 4\.3\.

LetXXbe a standardized random vector inℝd\\mathbb\{R\}^\{d\}, i\.e\. it has zero mean𝔼\[x\]=0\\mathbb\{E\}\[x\]=0and unit covariance matrixΣXX=I\\Sigma\_\{XX\}=I\. LetC∈\{0,1\}C\\in\\\{0,1\\\}be a concept indicator variable\. Letssbe defined as in Eq\.[1](https://arxiv.org/html/2605.05220#S3.E1)\. Letfswitchf\_\{switch\}be defined as in Eq\.[4](https://arxiv.org/html/2605.05220#S3.E4)\. Thenfswitchf\_\{switch\}as a function ofhhminimizes

minf∈Aff\(ℝd↦ℝd\)⁡E\[∥f\(X\)−X∥2\]\\displaystyle\\min\_\{f\\in\\mathrm\{Aff\}\(\\mathbb\{R\}^\{d\}\\mapsto\\mathbb\{R\}^\{d\}\)\}\\mathrm\{E\}\[\\lVert\{f\(X\)\-X\}\\rVert^\{2\}\]\\quad\(15\)s\.t\.Cov\(f\(X\),C\)=−Cov\(X,C\)\\displaystyle\\text\{ s\.t\. \}\\quad\\mathrm\{Cov\}\(f\(X\),C\)=\-\\mathrm\{Cov\}\(X,C\)\(16\)

The proof is found in the Appendix Sec\.[D\.3](https://arxiv.org/html/2605.05220#A4.SS3)\. This theorem shows that steering in switching mode can be seen as LEACE\-Switch under the assumptions that the whitening matrix is the identity and the mean of all vectors is zero\.

Scope of LEACE\-Switch\.While the constraint in Eq\.[9](https://arxiv.org/html/2605.05220#S4.E9)precisely captures the algebraic notion of inverting a linear concept signal, it does so under a specific set of assumptions that define the regime in which LEACE\-Switch is theoretically well\-posed\.

First, Eq\.[9](https://arxiv.org/html/2605.05220#S4.E9)assumes that the binary concept variableZZpartitions the dataset, i\.e\.,P\(Z=1\)\+P\(Z=0\)=1\.P\(Z=1\)\+P\(Z=0\)=1\.Under this condition, the linear dependence betweenXXandZZis defined globally over the data distribution and admits a dataset\-wide inversion\. When instead considering two conceptsc1c\_\{1\}andc2c\_\{2\}with corresponding indicatorsZ1Z\_\{1\}andZ2Z\_\{2\}that do not jointly span the dataset, this assumption is violated\. In such cases, the notion of a dataset\-wide bidirectional concept flip is no longer well\-posed, and Theorem[4\.2](https://arxiv.org/html/2605.05220#S4.Thmtheorem2)does not apply\. Second, even whenZZdoes partition the dataset, the constraint in Eq\.[9](https://arxiv.org/html/2605.05220#S4.E9)enforces a*complete*inversion of the linear dependence on the concept label\. As a result, LEACE\-Switch implements a symmetric transformation: representations associated withc1c\_\{1\}are mapped to those ofc2c\_\{2\}, and representations associated withc2c\_\{2\}are simultaneously mapped to those ofc1c\_\{1\}\. While this behavior is desirable in settings that explicitly require bidirectional swapping, it may be overly restrictive in applications where only a one\-directional transformation is intended\.

Together, these considerations delineate the scope of LEACE\-Switch as an optimal solution for symmetric concept inversion under dataset partitioning\. In the next section, we relax these assumptions and introduce a more general formulation for affine concept manipulation that enables directed, minimal\-disturbance concept switching\.

### 4\.3Optimal Affine Concept Manipulation

We now move beyond bidirectional concept switching and formulate a more general framework for affine concept manipulation\. Our goal is to enable*directed*concept transformations that do not require dataset\-wide label inversion or the assumption that the involved concepts jointly span the entire distribution\.

LetZ1Z\_\{1\}andZ2Z\_\{2\}represent indicators of two groups of conceptsCs=\{c1\(s\),…,cl\(s\)\}C\_\{s\}=\\\{c^\{\(s\)\}\_\{1\},\\dots,c^\{\(s\)\}\_\{l\}\\\}andCt=\{c1\(t\),…,cl\(t\)\}C\_\{t\}=\\\{c^\{\(t\)\}\_\{1\},\\dots,c^\{\(t\)\}\_\{l\}\\\}\. LetZ=\(Z1,Z2\)Z=\(Z\_\{1\},Z\_\{2\}\), whereZ1,Z2∈\{0,1\}lZ\_\{1\},Z\_\{2\}\\in\\\{0,1\\\}^\{l\}\. Our goal is to haveCov\(f\(X\),Z1\)=Cov\(X,Z2\)\\mathrm\{Cov\}\(f\(X\),Z\_\{1\}\)=\\mathrm\{Cov\}\(X,Z\_\{2\}\), meaning for eachii, the conceptci\(s\)c^\{\(s\)\}\_\{i\}maps to toci\(t\)c^\{\(t\)\}\_\{i\}\. Now we formulate the following theorem:

###### Theorem 4\.4\(MidSteer, affine optimal concept manipulation\)\.

LetX,ZX,Zbe defined as in Theorem[4\.2](https://arxiv.org/html/2605.05220#S4.Thmtheorem2)and assumek=2lk=2lforl\>0l\>0\. LetZ=\(Z1,Z2\)Z=\(Z\_\{1\},Z\_\{2\}\), whereZ1,Z2∈\{0,1\}lZ\_\{1\},Z\_\{2\}\\in\\\{0,1\\\}^\{l\}\. LetΣXZi=Cov\(X,Zi\),i∈\{1,2\}\\Sigma\_\{XZ\_\{i\}\}=\\mathrm\{Cov\}\(X,Z\_\{i\}\),i\\in\\\{1,2\\\}be the cross\-covariance matrices betweenXXandZiZ\_\{i\}\. AssumeIm\(ΣXZi\)⊆Im\(ΣXX\)\\mathrm\{Im\}\(\\Sigma\_\{XZ\_\{i\}\}\)\\subseteq\\mathrm\{Im\}\(\\Sigma\_\{XX\}\)fori∈\{1,2\}i\\in\\\{1,2\\\}\. AssumeΣXZ1\\Sigma\_\{XZ\_\{1\}\}has full column rank:rk\(ΣXZ1\)=l\\mathrm\{rk\}\\Big\(\\Sigma\_\{XZ\_\{1\}\}\\Big\)=l\. LetW=\(ΣXX1/2\)\+W=\(\\Sigma\_\{XX\}^\{1/2\}\)^\{\+\}be the whitening transformation\. LetΣWX,Zi=WΣXZi\\Sigma\_\{WX,Z\_\{i\}\}=W\\Sigma\_\{XZ\_\{i\}\}\.

Then we have the following optimization problem:

minA∈ℝd×db∈ℝd⁡E\[‖AX\+b−X‖22\]\\displaystyle\\min\_\{\\begin\{subarray\}\{c\}A\\in\\mathbb\{R\}^\{d\\times d\}\\\\ b\\in\\mathbb\{R\}^\{d\}\\end\{subarray\}\}\\mathrm\{E\}\[\\big\\\|AX\+b\-X\\big\\\|^\{2\}\_\{2\}\]\\quad\(17\)s\.t\.Cov\(AX\+b,Z1\)=Cov\(X,Z2\)\\displaystyle\\text\{ s\.t\. \}\\quad\\mathrm\{Cov\}\(AX\+b,Z\_\{1\}\)=\\mathrm\{Cov\}\(X,Z\_\{2\}\)\(18\)which has the following solution \(almost surely\):

A^\\displaystyle\\widehat\{A\}=I\+W\+\(ΣWX,Z2−ΣWX,Z1\)ΣWX,Z1\+W\\displaystyle=I\+W^\{\+\}\(\\Sigma\_\{WX,Z\_\{2\}\}\-\\Sigma\_\{WX,Z\_\{1\}\}\)\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}W\(19\)b^\\displaystyle\\widehat\{b\}=𝔼\[X\]−A^𝔼\[X\]\\displaystyle=\\mathbb\{E\}\[X\]\-\\widehat\{A\}\\mathbb\{E\}\[X\]\(20\)

The proof can be found in Sec\.[D\.4](https://arxiv.org/html/2605.05220#A4.SS4)\. Unlike LEACE, we now have two covariance matrices for each group of concepts, and do not require the labels to be flipped, but translated from one group to another\.

Connection between MidSteer and LEACE\.Note that ifZ2Z\_\{2\}is a constant indicator \(e\.g\., representing no item in the dataset\), thenCov\(X,Z2\)=0\\mathrm\{Cov\}\(X,Z\_\{2\}\)=0, and the MidSteer formula turns into concept erasure \(LEACE\)\. Thus we refer to MidSteer asaffine optimal concept manipulation, as it generalizes several concept manipulation tasks \(erasure, switch\) under one framework\.

### 4\.4Steering strength

Let us now introduce the steering strengthβ\\betafor erasure and switching\. Note that in vanilla steering \(i\.e\. steering defined by Eq\.[3](https://arxiv.org/html/2605.05220#S3.E3)and Eq\.[4](https://arxiv.org/html/2605.05220#S3.E4)\), we can unify Eq\.[3](https://arxiv.org/html/2605.05220#S3.E3)and Eq\.[4](https://arxiv.org/html/2605.05220#S3.E4)in the following way:

f\(h,s\)=h−β⋅⟨h,s⟩sf\(h,s\)=h\-\\beta\\cdot\\langle h,s\\rangle s\(21\)For LEACE,b^\\widehat\{b\}is unaffected, and Eq\.[6](https://arxiv.org/html/2605.05220#S3.E6)and Eq\.[13](https://arxiv.org/html/2605.05220#S4.E13)become:

A^=I−β⋅W\+\(WΣXZ\)\(WΣXZ\)\+W\\widehat\{A\}=I\-\\beta\\cdot W^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}W\(22\)Now,β=1\\beta=1represents concept erasure,0<β<10<\\beta<1represents lesser degree of erasure\.β=2\\beta=2represents concept switching, andβ\>2\\beta\>2refers to switching where the switched concept is more expressed than the base concept\.

For MidSteer, we update Eq\.[19](https://arxiv.org/html/2605.05220#S4.E19)as:

A^=I\+β⋅W\+\(ΣWX,Z2−ΣWX,Z1\)ΣWX,Z1\+W\\widehat\{A\}=I\+\\beta\\cdot W^\{\+\}\(\\Sigma\_\{WX,Z\_\{2\}\}\-\\Sigma\_\{WX,Z\_\{1\}\}\)\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}W\(23\)β=1\\beta=1represents normal steering mode,β<1\\beta<1represent steering whereZ2Z\_\{2\}has less expression compared to whatZ1Z\_\{1\}had in original representation\. Vice versa,β\>1\\beta\>1represent more expression forZ2Z\_\{2\}compared toZ1Z\_\{1\}\.

## 5Experiments

### 5\.1Experimental setup

We conduct experiments comparing MidSteer to vanilla steering and LEACE\-Switch on the task of concept switching\. We focus on concept switching in the main experiments, as concept erasure is recovered as a special case when the target concept is constant\. However, we provide comparison of all the approaches on concept erasure in the Appendix Sec\.[J](https://arxiv.org/html/2605.05220#A10)\. We consider two modalities: \(1\) text, for which we use Large Language Models \(LLMs\); \(2\) images, for which we use Diffusion Models\.

![Refer to caption](https://arxiv.org/html/2605.05220v1/x1.png)\(a\)ΔCS\\Delta CSvs CS of unrelated concepts
for the Llama\-2\-7b model\.
![Refer to caption](https://arxiv.org/html/2605.05220v1/x2.png)\(b\)ΔCS\\Delta CSvs 1 \- BERT Precision on MMLU
for the Llama\-2\-7b model\.
![Refer to caption](https://arxiv.org/html/2605.05220v1/x3.png)\(c\)ΔCS\\Delta CSvs CS of unrelated concepts
for the SDXL model\.
![Refer to caption](https://arxiv.org/html/2605.05220v1/x4.png)\(d\)ΔCS\\Delta CSvs FID of unrelated concepts
for the SDXL model\.

Figure 2:Pareto efficiency frontiers for conceptswitchingexperiments with steering, LEACE, and MidSteer highlighting differentβ\\betas\.Evaluation setup\.In each case, given a pair of concepts \(csc\_\{s\},ctc\_\{t\}\) to switch, we estimate class\-conditional covariancesΣXZ1,ΣXZ2\\Sigma\_\{XZ\_\{1\}\},\\Sigma\_\{XZ\_\{2\}\}based on a sample of sizeN=1000N=1000, and self\-covariancesΣXX\\Sigma\_\{XX\}on a sample of sizeM=50000M=50000\. We ablate the number of prompts needed for estimatingΣXX\\Sigma\_\{XX\}in the Appendix Sec\.[H](https://arxiv.org/html/2605.05220#A8), and provide pseudocode for the algorithm on covariance estimation in the Sec\.[A](https://arxiv.org/html/2605.05220#A1)\.

As no standard benchmarks currently exist for concept switching, we introduce a dedicated evaluation setup\. Our procedure is largely based on established concept erasure benchmarks used for vision generative models\(Lyuet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib109); Wuet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib111)\)\. To test switching from the source conceptcsc\_\{s\}to the target conceptctc\_\{t\}, we use 80 template prompts prompting the model to generate output related tocsc\_\{s\}orctc\_\{t\}\. For each prompt we run 10 such generations varying the random seed\. We run the generation on these prompts with and without steering\. Templates for LLMs and diffusion models can be found in Sec\.[G](https://arxiv.org/html/2605.05220#A7)\. We use*Concept Score \(CS\)*to estimate the amount of conceptscsc\_\{s\}andctc\_\{t\}present in the model’s output\. In the case of ideal switching, CS for conceptctc\_\{t\}should be high, and CS for conceptcsc\_\{s\}should be low for prompts related to bothcsc\_\{s\}orctc\_\{t\}\.

To evaluate content preservation beyond the switched concepts, we additionally generate outputs for four unrelated concepts\{ci\}i=14\\\{c\_\{i\}\\\}\_\{i=1\}^\{4\}\. These concepts are chosen to span varying levels of semantic proximity tocsc\_\{s\}andctc\_\{t\}\. We compute CS over\{ci\}i=14\\\{c\_\{i\}\\\}\_\{i=1\}^\{4\}to quantify the extent to which these concepts are retained in the generated outputs, which ideally should not decrease under steering\. This design enables a more fine\-grained comparison of concept switching methods with respect to semantic interference\.

We use the following pairspi=\(cs→ct\)p\_\{i\}=\(c\_\{s\}\\to c\_\{t\}\)of concepts:p1p\_\{1\}= \(”Horse”→\\to”Motorcycle”\),p2p\_\{2\}=”Dog”→\\to”Cat”,p3p\_\{3\}=”Chihuahua”→\\to”Muffin”\)\. Note that none of these pairs span the whole dataset\. Corresponding unrelated conceptsti=\{ci\}i=14t\_\{i\}=\\\{c\_\{i\}\\\}^\{4\}\_\{i=1\}are :t1t\_\{1\}= \(”Cow”, ”Dog”, ”Pig”, ”Legislator”\),t2t\_\{2\}= \(”Cow”, ”Wolf”, ”Pig”, ”Legislator”\),t3t\_\{3\}= \(”Cat”, ”Dog”, ”Wolf”, ”Legislator”\)\.

Details on LLM experiments\.We test on instruction\-tuned Llama 2\(Touvronet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib5)\)and Qwen 2\.5\(Yanget al\.,[2024a](https://arxiv.org/html/2605.05220#bib.bib25)\)models\. We apply steering at every self\-attention \(SA\) layer ans use SA activations corresponding to the last token in prompt\. The dataset we used to estimate class\-conditional covariances was obtained by prompting GPT o4\-mini to generate various questions about each concept\. Details of the prompt used and examples can be found in Sec\.[E](https://arxiv.org/html/2605.05220#A5)\. To estimateΣXX\\Sigma\_\{XX\}, we use a sample of sizeMMfrom the Alpaca dataset\(Taoriet al\.,[2023](https://arxiv.org/html/2605.05220#bib.bib7)\)\.

We calculate the*Concept Score \(CS\)*for a conceptccusing LLM as a judge\. More specifically, we use the Llama\-3\.1\-8B\-Instruct\(Dubeyet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib103)\)model, and prompt it to estimate the amount ofccpresent in the generated output on a scale from 0 to 10\. Prompts used are outlined in the Appendix Sec\.[F](https://arxiv.org/html/2605.05220#A6)\.We additionally test how much outputs for testing concepts differ with and without steering by calculating BERT scores\(Zhanget al\.,[2020](https://arxiv.org/html/2605.05220#bib.bib104)\)on the MMLU\(Hendryckset al\.,[2021](https://arxiv.org/html/2605.05220#bib.bib105)\)dataset generations\. Lower values of BERT Precision represent more change in the underlying output\.

Details on Diffusion Models experimentsFor visual diffusion models, we test on SDXL\(Podellet al\.,[2024](https://arxiv.org/html/2605.05220#bib.bib60)\)and SANA\(Xieet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib26)\)models\. Following recent work\(Gaintsevaet al\.,[2025](https://arxiv.org/html/2605.05220#bib.bib1)\), we apply steering on activations of every cross\-attention \(CA\) layer\. CA activations corresponding to all images patches are used\.

We use a sample of image captions from RELAION\(Schuhmannet al\.,[2022](https://arxiv.org/html/2605.05220#bib.bib33)\)to estimate both the class\-conditional covariancesΣXZi\\Sigma\_\{XZ\_\{i\}\}, and self\-covariancesΣXX\\Sigma\_\{XX\}\. ForΣXZi\\Sigma\_\{XZ\_\{i\}\}we filtered the dataset to only captions that contain a specific word, mentioning the required concept\.

We use CLIP score\(Hesselet al\.,[2021](https://arxiv.org/html/2605.05220#bib.bib72)\)as*Concept Score \(CS\)*\. Additionally, we measure the change between generations based on the testing concepts when steering is applied, by calculating FID\(Heuselet al\.,[2017](https://arxiv.org/html/2605.05220#bib.bib73)\)between images generated by steered model and vanilla model\. Higher values of FID represent more change to the underlying image\.

![Refer to caption](https://arxiv.org/html/2605.05220v1/img/teaser.png)Figure 3:Qualitative results on switching to steer ”horses” into ”motorcycles”\. While all methods similarly successfully performed switching from ”horse” to ”motorcycle”, vanilla steering \(CASteer\) and LEACE fail when presented with prompt for the target concept \(”motorcycle”\), unable to distinguish between forward and reverse steering\. CASteer also additionally failed on the ”cow” concept, and more significantly altered images of the concept ”dog”\.
### 5\.2Experimental results

We compare results on concept switching by applying vanilla steering, LEACE\-Switch and MidSteer with different values ofβ\\beta\. We present results on LLama\-2\-7b and SDXL models aggregated for all three concept pairsp1,p2,p3p\_\{1\},p\_\{2\},p\_\{3\}in Fig\.[2](https://arxiv.org/html/2605.05220#S5.F2)\. We see that in each case MidSteer achieves much better balance between level of concept switch betweenc1c\_\{1\}andc2c\_\{2\}and preservation of other concepts across different values ofβ\\beta\. Pareto plots for other models, as well as Pareto plots for individual concept and metric pairs can be found in the Appendix Sec\.[I\.0\.1](https://arxiv.org/html/2605.05220#A9.SS0.SSS1),[I\.1](https://arxiv.org/html/2605.05220#A9.SS1)\.

To better illustrate differences between vanilla steering, LEACE and MidSteer for concept flipping, in Tab\.[1](https://arxiv.org/html/2605.05220#S5.T1)we present results on switching a concept ofc1=c\_\{1\}=“horse” toc2=c\_\{2\}=“motorcycle” on the SDXL model\. We compare Vanilla switching and LEACE withβ=2\\beta=2and MidSteer withβ=1\\beta=1, as these are default parameters for these methods as suggested by Eqs\.[25](https://arxiv.org/html/2605.05220#A2.E25),[13](https://arxiv.org/html/2605.05220#S4.E13), and[19](https://arxiv.org/html/2605.05220#S4.E19)\. First note, that all the methods successfully flip “horse” to “motorcycle”, having similar CS scores on source \(“horse”\) and target \(“motorcycle”\) concepts\. Second, it can be seen that as suggested by definitions Eqs\.[25](https://arxiv.org/html/2605.05220#A2.E25),[13](https://arxiv.org/html/2605.05220#S4.E13), vanilla steering and LEACE fail to keep the “motorcycle” concept intact when flipping “horse” to “motorcycle”, as target CS score goes down\. In contrast, MidSteer keeps “motorcycle” intact\. This is also illustrated in Fig\.[3](https://arxiv.org/html/2605.05220#S5.F3)\. Next, CS score of “cow” and FID scores of “cow”, “pig” and “dog” are worse for vanilla than for other methods, showing superiority of LEACE and MidSteer over vanilla steering in ability to keep unrelated concepts intact\. Results on other concepts flipping on both LLMs and diffusion models show similar patterns\. We observe the same trend over all the concept pairs and models\. We give more details in Appendix Sec\.[I\.2](https://arxiv.org/html/2605.05220#A9.SS2),[I\.3](https://arxiv.org/html/2605.05220#A9.SS3)\.

Table 1:Results on SDXL when flipping from “horse” to “motorcycle”\. Reported are CLIP scores \(cs\) and FID for target and non\-target concepts\.

## 6Conclusion

In this work, we bridge the gap between previous empirical research in steering generative models and the theory of affine concept steering\. We extend this theoretical framework to concept switching\. We define the corresponding optimisation problem and solve it in closed form\. We then present MidSteer, a general steering method, that is theoretically optimal under certain conditions\. It outperforms other methods on concept switching for both LLMs and image diffusion models, while having the advantage of clear matrix form representation\. To our knowledge, this is the first theoretical treatment of steering beyond erasure, connecting empirical heuristics and principled affine methods, while reaching state\-of\-the\-art results\.

## Impact Statement

This paper presents work whose goal is to advance the field of machine learning\. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here\.

## References

- Anthropic \(2024\)Evaluating feature steering: a case study in mitigating social biases\.Note:[https://www\.anthropic\.com/research/evaluating\-feature\-steering](https://www.anthropic.com/research/evaluating-feature-steering)Accessed: 2026\-04\-14Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p2.1)\.
- L\. Bartoszcze, S\. Munshi, B\. Sukidi, J\. Yen, Z\. Yang, D\. Williams\-King, L\. Le, K\. Asuzu, and C\. Maple \(2025\)Representation engineering for large\-language models: survey and research challenges\.CoRRabs/2502\.17601\.External Links:[Link](https://doi.org/10.48550/arXiv.2502.17601),[Document](https://dx.doi.org/10.48550/ARXIV.2502.17601),2502\.17601Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p1.1),[§2](https://arxiv.org/html/2605.05220#S2.p1.1)\.
- N\. Belrose, D\. Schneider\-Joseph, S\. Ravfogel, R\. Cotterell, E\. Raff, and S\. Biderman \(2025\)LEACE: perfect linear concept erasure in closed form\.External Links:2306\.03819,[Link](https://arxiv.org/abs/2306.03819)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p3.1),[§2](https://arxiv.org/html/2605.05220#S2.p2.1),[§3\.2](https://arxiv.org/html/2605.05220#S3.SS2.p1.1),[§3\.2](https://arxiv.org/html/2605.05220#S3.SS2.p2.5),[§3\.2](https://arxiv.org/html/2605.05220#S3.SS2.p4.1),[Theorem 3\.2](https://arxiv.org/html/2605.05220#S3.Thmtheorem2),[Theorem 3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3),[1\(a\)](https://arxiv.org/html/2605.05220#S4.F1.sf1),[1\(a\)](https://arxiv.org/html/2605.05220#S4.F1.sf1.2.1)\.
- A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Yang, A\. Fan, A\. Goyal, A\. Hartshorn, A\. Yang, A\. Mitra, A\. Sravankumar, A\. Korenev, A\. Hinsvark, A\. Rao, A\. Zhang, A\. Rodriguez, A\. Gregerson, A\. Spataru, B\. Rozière, B\. Biron, B\. Tang, B\. Chern, C\. Caucheteux, C\. Nayak, C\. Bi, C\. Marra, C\. McConnell, C\. Keller, C\. Touret, C\. Wu, C\. Wong, C\. C\. Ferrer, C\. Nikolaidis, D\. Allonsius, D\. Song, D\. Pintz, D\. Livshits, D\. Esiobu, D\. Choudhary, D\. Mahajan, D\. Garcia\-Olano, D\. Perino, D\. Hupkes, E\. Lakomkin, E\. AlBadawy, E\. Lobanova, E\. Dinan, E\. M\. Smith, F\. Radenovic, F\. Zhang, G\. Synnaeve, G\. Lee, G\. L\. Anderson, G\. Nail, G\. Mialon, G\. Pang, G\. Cucurell, H\. Nguyen, H\. Korevaar, H\. Xu, H\. Touvron, I\. Zarov, I\. A\. Ibarra, I\. M\. Kloumann, I\. Misra, I\. Evtimov, J\. Copet, J\. Lee, J\. Geffert, J\. Vranes, J\. Park, J\. Mahadeokar, J\. Shah, J\. van der Linde, J\. Billock, J\. Hong, J\. Lee, J\. Fu, J\. Chi, J\. Huang, J\. Liu, J\. Wang, J\. Yu, J\. Bitton, J\. Spisak, J\. Park, J\. Rocca, J\. Johnstun, J\. Saxe, J\. Jia, K\. V\. Alwala, K\. Upasani, K\. Plawiak, K\. Li, K\. Heafield, K\. Stone, and et al\. \(2024\)The llama 3 herd of models\.CoRRabs/2407\.21783\.External Links:[Link](https://doi.org/10.48550/arXiv.2407.21783),[Document](https://dx.doi.org/10.48550/ARXIV.2407.21783),2407\.21783Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p7.2)\.
- T\. Gaintseva, C\. Ma, Z\. Liu, M\. Benning, G\. Slabaugh, J\. Deng, and I\. Elezi \(2025\)CASteer: steering diffusion models for controllable generation\.External Links:2503\.09630,[Link](https://arxiv.org/abs/2503.09630)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p2.1),[§2](https://arxiv.org/html/2605.05220#S2.p1.1),[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p8.1)\.
- D\. Hendrycks, C\. Burns, S\. Basart, A\. Zou, M\. Mazeika, D\. Song, and J\. Steinhardt \(2021\)Measuring massive multitask language understanding\.In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\-7, 2021,External Links:[Link](https://openreview.net/forum?id=d7KBjmI3GmQ)Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p7.2)\.
- J\. Hessel, A\. Holtzman, M\. Forbes, R\. L\. Bras, and Y\. Choi \(2021\)CLIPScore: A reference\-free evaluation metric for image captioning\.InEMNLP,Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p10.1)\.
- M\. Heusel, H\. Ramsauer, T\. Unterthiner, B\. Nessler, and S\. Hochreiter \(2017\)GANs trained by a two time\-scale update rule converge to a local nash equilibrium\.InNeurIPS,Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p10.1)\.
- F\. Holstege, S\. Ravfogel, and B\. Wouters \(2025\)Preserving task\-relevant information under linear concept removal\.CoRRabs/2506\.10703\.External Links:[Link](https://doi.org/10.48550/arXiv.2506.10703),[Document](https://dx.doi.org/10.48550/ARXIV.2506.10703),2506\.10703Cited by:[§2](https://arxiv.org/html/2605.05220#S2.p2.1)\.
- M\. Kwon, J\. Jeong, and Y\. Uh \(2023\)Diffusion models already have A semantic latent space\.InICLR,Cited by:[§2](https://arxiv.org/html/2605.05220#S2.p1.1)\.
- M\. Lyu, Y\. Yang, H\. Hong, H\. Chen, X\. Jin, Y\. He, H\. Xue, J\. Han, and G\. Ding \(2024\)One\-dimensional adapter to rule them all: concepts, diffusion models and erasing applications\.In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition \(CVPR\),Vol\.,pp\. 7559–7568\.External Links:[Document](https://dx.doi.org/10.1109/CVPR52733.2024.00722)Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p3.10)\.
- H\. Naveed, A\. U\. Khan, S\. Qiu, M\. Saqib, S\. Anwar, M\. Usman, N\. Barnes, and A\. Mian \(2023\)A comprehensive overview of large language models\.CoRRabs/2307\.06435\.External Links:[Link](https://doi.org/10.48550/arXiv.2307.06435),[Document](https://dx.doi.org/10.48550/ARXIV.2307.06435),2307\.06435Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p1.1)\.
- N\. Panickssery, N\. Gabrieli, J\. Schulz, M\. Tong, E\. Hubinger, and A\. M\. Turner \(2024\)Steering llama 2 via contrastive activation addition\.External Links:2312\.06681,[Link](https://arxiv.org/abs/2312.06681)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p2.1)\.
- D\. Podell, Z\. English, K\. Lacey, A\. Blattmann, T\. Dockhorn, J\. Müller, J\. Penna, and R\. Rombach \(2024\)SDXL: improving latent diffusion models for high\-resolution image synthesis\.InICLR,Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p8.1)\.
- J\. B\. Raedler, W\. Li, A\. M\. Taliotis, M\. Goyal, S\. Swaroop, and W\. Pan \(2025\)The necessity for intervention fidelity: unintended side effects when steering LLMs\.InICML 2025 Workshop on Reliable and Responsible Foundation Models,External Links:[Link](https://openreview.net/forum?id=8nYQEGou3L)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p2.1)\.
- S\. Ravfogel, Y\. Elazar, H\. Gonen, M\. Twiton, and Y\. Goldberg \(2020\)Null it out: guarding protected attributes by iterative nullspace projection\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5\-10, 2020,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. R\. Tetreault \(Eds\.\),pp\. 7237–7256\.External Links:[Link](https://doi.org/10.18653/v1/2020.acl-main.647),[Document](https://dx.doi.org/10.18653/V1/2020.ACL-MAIN.647)Cited by:[§2](https://arxiv.org/html/2605.05220#S2.p2.1)\.
- S\. Ravfogel, Y\. Goldberg, and R\. Cotterell \(2023\)Log\-linear guardedness and its implications\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 9413–9431\.External Links:[Link](https://aclanthology.org/2023.acl-long.523/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.523)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p3.1),[§2](https://arxiv.org/html/2605.05220#S2.p2.1),[§3\.2](https://arxiv.org/html/2605.05220#S3.SS2.p1.1)\.
- N\. Rimsky, N\. Gabrieli, J\. Schulz, M\. Tong, E\. Hubinger, and A\. M\. Turner \(2024\)Steering llama 2 via contrastive activation addition\.External Links:[Link](https://doi.org/10.18653/v1/2024.acl-long.828),[Document](https://dx.doi.org/10.18653/V1/2024.ACL-LONG.828)Cited by:[§2](https://arxiv.org/html/2605.05220#S2.p1.1)\.
- C\. Schuhmann, R\. Beaumont, R\. Vencu, C\. Gordon, R\. Wightman, M\. Cherti, T\. Coombes, A\. Katta, C\. Mullis, M\. Wortsman, P\. Schramowski, S\. Kundurthy, K\. Crowson, L\. Schmidt, R\. Kaczmarczyk, and J\. Jitsev \(2022\)LAION\-5B: an open large\-scale dataset for training next generation image\-text models\.InNeurIPS,Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p9.3)\.
- S\. Singh, S\. Ravfogel, J\. Herzig, R\. Aharoni, R\. Cotterell, and P\. Kumaraguru \(2024\)Representation surgery: theory and practice of affine steering\.InForty\-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21\-27, 2024,External Links:[Link](https://openreview.net/forum?id=GwA4go0Mw4)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p2.1),[§2](https://arxiv.org/html/2605.05220#S2.p3.1)\.
- R\. Taori, I\. Gulrajani, T\. Zhang, Y\. Dubois, X\. Li, C\. Guestrin, P\. Liang, and T\. B\. Hashimoto \(2023\)Stanford alpaca: an instruction\-following llama model\.GitHub\.Note:[https://github\.com/tatsu\-lab/stanford\_alpaca](https://github.com/tatsu-lab/stanford_alpaca)Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p6.2)\.
- H\. Touvron, L\. Martin, K\. Stone, P\. Albert, A\. Almahairi, Y\. Babaei, N\. Bashlykov, S\. Batra, P\. Bhargava, S\. Bhosale, D\. Bikel, L\. Blecher, C\. C\. Ferrer, M\. Chen, G\. Cucurull, D\. Esiobu, J\. Fernandes, J\. Fu, W\. Fu, B\. Fuller, C\. Gao, V\. Goswami, N\. Goyal, A\. Hartshorn, S\. Hosseini, R\. Hou, H\. Inan, M\. Kardas, V\. Kerkez, M\. Khabsa, I\. Kloumann, A\. Korenev, P\. S\. Koura, M\. Lachaux, T\. Lavril, J\. Lee, D\. Liskovich, Y\. Lu, Y\. Mao, X\. Martinet, T\. Mihaylov, P\. Mishra, I\. Molybog, Y\. Nie, A\. Poulton, J\. Reizenstein, R\. Rungta, K\. Saladi, A\. Schelten, R\. Silva, E\. M\. Smith, R\. Subramanian, X\. E\. Tan, B\. Tang, R\. Taylor, A\. Williams, J\. X\. Kuan, P\. Xu, Z\. Yan, I\. Zarov, Y\. Zhang, A\. Fan, M\. Kambadur, S\. Narang, A\. Rodriguez, R\. Stojnic, S\. Edunov, and T\. Scialom \(2023\)Llama 2: open foundation and fine\-tuned chat models\.External Links:2307\.09288,[Link](https://arxiv.org/abs/2307.09288)Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p6.2)\.
- N\. Tumanyan, M\. Geyer, S\. Bagon, and T\. Dekel \(2023\)Plug\-and\-play diffusion features for text\-driven image\-to\-image translation\.InCVPR,Cited by:[§2](https://arxiv.org/html/2605.05220#S2.p1.1)\.
- A\. M\. Turner, L\. Thiergart, D\. Udell, G\. Leech, U\. Mini, and M\. MacDiarmid \(2023\)Activation addition: steering language models without optimization\.CoRRabs/2308\.10248\.External Links:[Link](https://doi.org/10.48550/arXiv.2308.10248),[Document](https://dx.doi.org/10.48550/ARXIV.2308.10248),2308\.10248Cited by:[§2](https://arxiv.org/html/2605.05220#S2.p1.1)\.
- J\. Wehner, S\. Abdelnabi, D\. Tan, D\. Krueger, and M\. Fritz \(2025\)Taxonomy, opportunities, and challenges of representation engineering for large language models\.Trans\. Mach\. Learn\. Res\.2025\.External Links:[Link](https://openreview.net/forum?id=2U1KIfmaU9)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p2.1)\.
- B\. P\. Welford \(1962\)Note on a method for calculating corrected sums of squares and products\.Technometrics4\(3\),pp\. 419–420\.External Links:[Document](https://dx.doi.org/10.1080/00401706.1962.10490022),[Link](https://www.tandfonline.com/doi/abs/10.1080/00401706.1962.10490022),https://www\.tandfonline\.com/doi/pdf/10\.1080/00401706\.1962\.10490022Cited by:[Appendix A](https://arxiv.org/html/2605.05220#A1.p1.6)\.
- Y\. Wu, S\. Zhou, M\. Yang, L\. Wang, H\. Chang, W\. Zhu, X\. Hu, X\. Zhou, and X\. Yang \(2025\)Unlearning concepts in diffusion model via concept domain correction and concept preserving gradient\.External Links:2405\.15304,[Link](https://arxiv.org/abs/2405.15304)Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p3.10)\.
- E\. Xie, J\. Chen, J\. Chen, H\. Cai, H\. Tang, Y\. Lin, Z\. Zhang, M\. Li, L\. Zhu, Y\. Lu, and S\. Han \(2025\)SANA: efficient high\-resolution text\-to\-image synthesis with linear diffusion transformers\.InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24\-28, 2025,External Links:[Link](https://openreview.net/forum?id=N8Oj1XhtYZ)Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p8.1)\.
- A\. Yang, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Li, D\. Liu, F\. Huang, H\. Wei, H\. Lin, J\. Yang, J\. Tu, J\. Zhang, J\. Yang, J\. Yang, J\. Zhou, J\. Lin, K\. Dang, K\. Lu, K\. Bao, K\. Yang, L\. Yu, M\. Li, M\. Xue, P\. Zhang, Q\. Zhu, R\. Men, R\. Lin, T\. Li, T\. Xia, X\. Ren, X\. Ren, Y\. Fan, Y\. Su, Y\. Zhang, Y\. Wan, Y\. Liu, Z\. Cui, Z\. Zhang, and Z\. Qiu \(2024a\)Qwen2\.5 technical report\.CoRRabs/2412\.15115\.External Links:[Link](https://doi.org/10.48550/arXiv.2412.15115),[Document](https://dx.doi.org/10.48550/ARXIV.2412.15115),2412\.15115Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p6.2)\.
- L\. Yang, Z\. Zhang, Y\. Song, S\. Hong, R\. Xu, Y\. Zhao, W\. Zhang, B\. Cui, and M\. Yang \(2024b\)Diffusion models: A comprehensive survey of methods and applications\.ACM Comput\. Surv\.56\(4\),pp\. 105:1–105:39\.External Links:[Link](https://doi.org/10.1145/3626235),[Document](https://dx.doi.org/10.1145/3626235)Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p1.1)\.
- T\. Zhang, V\. Kishore, F\. Wu, K\. Q\. Weinberger, and Y\. Artzi \(2020\)BERTScore: evaluating text generation with BERT\.In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\-30, 2020,External Links:[Link](https://openreview.net/forum?id=SkeHuCVFDr)Cited by:[§5\.1](https://arxiv.org/html/2605.05220#S5.SS1.p7.2)\.
- A\. Zou, L\. Phan, S\. L\. Chen, J\. Campbell, P\. Guo, R\. Ren, A\. Pan, X\. Yin, M\. Mazeika, A\. Dombrowski, S\. Goel, N\. Li, M\. J\. Byun, Z\. Wang, A\. Mallen, S\. Basart, S\. Koyejo, D\. Song, M\. Fredrikson, J\. Z\. Kolter, and D\. Hendrycks \(2023\)Representation engineering: A top\-down approach to AI transparency\.CoRRabs/2310\.01405\.External Links:[Link](https://doi.org/10.48550/arXiv.2310.01405),[Document](https://dx.doi.org/10.48550/ARXIV.2310.01405),2310\.01405Cited by:[§1](https://arxiv.org/html/2605.05220#S1.p2.1)\.

## Appendix AAlgorithm for computing covariances

To estimate the covariances we use the algorithm by\(Welford,[1962](https://arxiv.org/html/2605.05220#bib.bib108)\)on a sample of broad prompts \(unrelated to the steering concepts\)\. GivenXXwith the dimension ofbatch\_size×num\_heads×seq\_len×hidden\_dimbatch\\\_size\\times num\\\_heads\\times seq\\\_len\\times hidden\\\_dim, the algorithm estimates the covariance matrixΣXX\\Sigma\_\{X\}Xof sizehidden\_dim×hidden\_dimhidden\\\_dim\\times hidden\\\_dim\. It does this by maintaining sample\-level statistics of sizeO\(hidden\_dim2\)O\(hidden\\\_dim^\{2\}\)in memory and takesO\(batch\_size∗seq\_len∗hidden\_dim2\)O\(batch\\\_size\*seq\\\_len\*hidden\\\_dim^\{2\}\)time to update them for the output of a particular layer on a particular batch\. In practice, estimating the covariances for 50,000 samples for SANA 1\.6 finished in under 15 minutes, and for Qwen2\.5 14B in under 30 minutes on our hardware setup\.

In Algorithm[1](https://arxiv.org/html/2605.05220#alg1)we provide pseudocode of the algorithm\.

Algorithm 1Welford’s Algorithm for Online Mean and Covariance Estimation1:Input:Stream of data batches

\{𝐗1,𝐗2,…,𝐗K\}\\\{\\mathbf\{X\}\_\{1\},\\mathbf\{X\}\_\{2\},\\ldots,\\mathbf\{X\}\_\{K\}\\\}where

𝐗k∈ℝh×mk×d\\mathbf\{X\}\_\{k\}\\in\\mathbb\{R\}^\{h\\times m\_\{k\}\\times d\}
2:Output:Mean

𝝁\\bm\{\\mu\}and covariance

𝚺\\mathbf\{\\Sigma\}estimates

3:Initialization:

4:

n←0n\\leftarrow 0
5:

𝐌←𝟎∈ℝh×d\\mathbf\{M\}\\leftarrow\\mathbf\{0\}\\in\\mathbb\{R\}^\{h\\times d\}
6:

𝐒←𝟎∈ℝh×d×d\\mathbf\{S\}\\leftarrow\\mathbf\{0\}\\in\\mathbb\{R\}^\{h\\times d\\times d\}
7:

8:for

k=1k=1to

KKdo

9:

mk←m\_\{k\}\\leftarrownumber of samples in batch

𝐗k\\mathbf\{X\}\_\{k\}
10:if

n=0n=0then

11:

n←mkn\\leftarrow m\_\{k\}
12:

𝐌←∑j=1mk𝐱k,j\\mathbf\{M\}\\leftarrow\\sum\_\{j=1\}^\{m\_\{k\}\}\\mathbf\{x\}\_\{k,j\}
13:

𝝁←𝐌/n\\bm\{\\mu\}\\leftarrow\\mathbf\{M\}/n
14:

𝚫←𝐗k−𝝁⊗𝟏mk⊤\\mathbf\{\\Delta\}\\leftarrow\\mathbf\{X\}\_\{k\}\-\\bm\{\\mu\}\\otimes\\mathbf\{1\}\_\{m\_\{k\}\}^\{\\top\}
15:

𝐒←𝚫𝖧𝚫\\mathbf\{S\}\\leftarrow\\mathbf\{\\Delta\}^\{\\mathsf\{H\}\}\\mathbf\{\\Delta\}
16:else

17:

𝝁old←𝐌/n\\bm\{\\mu\}\_\{\\text\{old\}\}\\leftarrow\\mathbf\{M\}/n
18:

n←n\+mkn\\leftarrow n\+m\_\{k\}
19:

𝐌←𝐌\+∑j=1mk𝐱k,j\\mathbf\{M\}\\leftarrow\\mathbf\{M\}\+\\sum\_\{j=1\}^\{m\_\{k\}\}\\mathbf\{x\}\_\{k,j\}
20:

𝝁new←𝐌/n\\bm\{\\mu\}\_\{\\text\{new\}\}\\leftarrow\\mathbf\{M\}/n
21:

𝚫old←𝐗k−𝝁old⊗𝟏mk⊤\\mathbf\{\\Delta\}\_\{\\text\{old\}\}\\leftarrow\\mathbf\{X\}\_\{k\}\-\\bm\{\\mu\}\_\{\\text\{old\}\}\\otimes\\mathbf\{1\}\_\{m\_\{k\}\}^\{\\top\}
22:

𝚫new←𝐗k−𝝁new⊗𝟏mk⊤\\mathbf\{\\Delta\}\_\{\\text\{new\}\}\\leftarrow\\mathbf\{X\}\_\{k\}\-\\bm\{\\mu\}\_\{\\text\{new\}\}\\otimes\\mathbf\{1\}\_\{m\_\{k\}\}^\{\\top\}
23:

𝐒←𝐒\+𝚫old𝖧𝚫new\\mathbf\{S\}\\leftarrow\\mathbf\{S\}\+\\mathbf\{\\Delta\}\_\{\\text\{old\}\}^\{\\mathsf\{H\}\}\\mathbf\{\\Delta\}\_\{\\text\{new\}\}
24:endif

25:endfor

26:Finalization:

27:

𝝁←𝐌/n\\bm\{\\mu\}\\leftarrow\\mathbf\{M\}/n
28:

𝚺←1n−1⋅𝐒\+𝐒𝖧2\\mathbf\{\\Sigma\}\\leftarrow\\frac\{1\}\{n\-1\}\\cdot\\frac\{\\mathbf\{S\}\+\\mathbf\{S\}^\{\\mathsf\{H\}\}\}\{2\}
29:return

𝝁,𝚺\\bm\{\\mu\},\\mathbf\{\\Sigma\}

## Appendix BIncorporating steering into model weights

Recall that the last layer of self\-attention block in LLMs or cross\-attention block in SDXL/SANA is a Linear layer with no bias and no activation function, i\.e\., essentially is a matrix multiplication with bias correction term:hout=Wproj\_outhin\+b\.h\_\{out\}=W\_\{proj\\\_out\}h\_\{in\}\+b\.HereWproj\_outW\_\{proj\\\_out\}is a weight matrix of the lastproj\_outproj\\\_outlayer of SA/CA block of LLM/SDXL/SANA,hinh\_\{in\}andhouth\_\{out\}are input and output to that layer,houth\_\{out\}being the final output of SA/CA layer\.

Assumingssis normalized \(‖s‖=1\\\|s\\\|=1\), vanilla concept erasure represents orthogonal projection onto the subspace orthogonal tossand can be written in a matrix form:

fdelete\(h,s\)=\(I−ssT\)hf\_\{\\text\{delete\}\}\(h,s\)=\(I\-ss^\{T\}\)h\(24\)
Vanilla concept switching is a Householder reflection of the vectorhhacross the hyperplane orthogonal toss:

fswitch\(h,s\)=\(I−2ssT\)hf\_\{\\text\{switch\}\}\(h,s\)=\(I\-2ss^\{T\}\)h\(25\)
LEACE / MidSteer are already presented in this paper in matrix form\.

Thus, by combining last layer of SA/CA block with matrix formulation of steering/LEACE/MidSteer, we can incorporate the transformation directly into weights of the model, by multiplying weight matrix of the last layer of SA/CA block withA∗A^\{\*\}matrix from of steering/LEACE/MidSteer:

hout=A∗\(Wproj\_outhin\+b\)\+b∗=Wproj\_outshin\+bsh\_\{out\}=A^\{\*\}\(W\_\{proj\\\_out\}h\_\{in\}\+b\)\+b^\{\*\}=W^\{s\}\_\{proj\\\_out\}h\_\{in\}\+b^\{s\}\(26\)Wproj\_outsW^\{s\}\_\{proj\\\_out\}is a matrix of the same size asWproj\_outW\_\{proj\\\_out\}\. This results in having zero inference overhead compares to original LLM/SDXL/SANA models\.

## Appendix CLLM qualitative results

In this section in fig\.[4](https://arxiv.org/html/2605.05220#A3.F4)we present qualitative results for different steering methods on LLM \(Qwen2\.5\-14B\-instruct\)\. Results show similar pattern to that of images \(see fig\.[3](https://arxiv.org/html/2605.05220#S5.F3)\)\. While all methods similarly successfully performed switching from ”horse” to ”motorcycle”, vanilla steering and LEACE failed when presented with prompt for the target concept \(”motorcycle”\), since they do not distinguish between forward and reverse steering\. Vanilla steering also more significantly altered texts of concepts ”cow” and ”dog”\.

Figure 4:Qualitative text steering results for four content categories \(horse,motorcycle,cow,dog\)\. Results are reported using vanilla Qwen2\.5\-14B\-instruct model, and three steering methods:Vanilla Steering,LEACE\-Switch,MidSteer\)\. Each cell shows the generated text for the prompt”Write a short story about a X”, where X is a corresponding category\.
## Appendix DTheorem proofs

### D\.1Proof of Thm\.[4\.1](https://arxiv.org/html/2605.05220#S4.Thmtheorem1)\(vanilla erasure is a special case of LEACE\)

###### Proof\.

We havek=1,Z=Ck=1,Z=C\. According toTheorem[3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3),f\(X\)=A∗X\+b∗f\(X\)=A^\{\*\}X\+b^\{\*\}, whereA∗A^\{\*\}is defined as in equation[6](https://arxiv.org/html/2605.05220#S3.E6)andb∗b^\{\*\}is defined as in equation[7](https://arxiv.org/html/2605.05220#S3.E7), minimizes equation[5](https://arxiv.org/html/2605.05220#S3.E5)\.

We concludeb∗=0b^\{\*\}=0sinceE\[X\]=0\\mathrm\{E\}\[X\]=0\. Further, it can be shown thatW=ΣXX−1/2=I−1/2=IW=\\Sigma\_\{XX\}^\{\-1/2\}=I^\{\-1/2\}=I; hence, the transformff\(that is optimal according toTheorem[3\.3](https://arxiv.org/html/2605.05220#S3.Thmtheorem3)\) simplifies to

f\(X\)=\(I−ΣXZΣXZ\+\)X\.f\(X\)=\\Big\(I\-\\Sigma\_\{XZ\}\\Sigma\_\{XZ\}^\{\+\}\\Big\)X\\,\.\(27\)
Recall that we are working ink=1k=1, soΣXZ∈ℝd×1\\Sigma\_\{XZ\}\\in\\mathbb\{R\}^\{d\\times 1\}is a column\-vector\. By definition of the Moore\-Penrose inverse for column\-vectors,

ΣXZ\+\\displaystyle\\Sigma\_\{XZ\}^\{\+\}=ΣXZT∥ΣXZ∥2,\\displaystyle=\\frac\{\\Sigma\_\{XZ\}^\{T\}\}\{\\lVert\{\\Sigma\_\{XZ\}\}\\rVert^\{2\}\}\\,,hencef\(X\)\\displaystyle f\(X\)=X−s′s′⁣TX=X−⟨X,s′⟩s′,\\displaystyle=X\-s^\{\\prime\}s^\{\\prime T\}X=X\-\\langle X,s^\{\\prime\}\\rangle s^\{\\prime\},fors′=ΣXZ/‖ΣXZ‖s^\{\\prime\}=\\Sigma\_\{XZ\}/\\\|\\Sigma\_\{XZ\}\\\|,

Now,

ΣXZ=Cov\(X,Z\)=E\[XZ\]−E\[X\]⋅E\[Z\]=E\[X⋅1\|Z=1\]⋅P\(Z=1\)\+E\[X⋅0\|Z=0\]⋅P\(Z=0\)−E\[X\]⋅P\(Z=1\)=P\(Z=1\)⋅\(E\[X\|Z=1\]−E\[X\]\)\\Sigma\_\{XZ\}=\\mathrm\{Cov\}\(X,Z\)=\\mathrm\{E\}\[XZ\]\-\\mathrm\{E\}\[X\]\\cdot\\mathrm\{E\}\[Z\]=\\mathrm\{E\}\[X\\cdot 1\|Z=1\]\\cdot P\(Z=1\)\+\\\\ \\mathrm\{E\}\[X\\cdot 0\|Z=0\]\\cdot P\(Z=0\)\-\\mathrm\{E\}\[X\]\\cdot P\(Z=1\)=\\\\ P\(Z=1\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\]\\Big\)
Now recall thatP\(Z=1\)\+P\(Z=0\)=1P\(Z=1\)\+P\(Z=0\)=1, so

ΣXZ=P\(Z=1\)⋅\(E\[X\|Z=1\]−E\[X\]\)=P\(Z=1\)⋅\(E\[X\|Z=1\]−E\[X\|Z=1\]P\(Z=1\)−E\[X\|Z=0\]P\(Z=0\)\)=P\(Z=1\)⋅P\(Z=0\)⋅\(E\[X\|Z=1\]−E\[X\|Z=0\]\)\\Sigma\_\{XZ\}=P\(Z=1\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\]\\Big\)=\\\\ P\(Z=1\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\|Z=1\]P\(Z=1\)\-\\mathrm\{E\}\[X\|Z=0\]P\(Z=0\)\\Big\)=\\\\ P\(Z=1\)\\cdot P\(Z=0\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\|Z=0\]\\Big\)
, sos′=ss^\{\\prime\}=sandffis equivalent tofdeletef\_\{delete\}\. Hence,fdeletef\_\{delete\}is the transformation that minimises equation[8](https://arxiv.org/html/2605.05220#S4.E8)\.

∎

### D\.2Proof of Thm\.[4\.2](https://arxiv.org/html/2605.05220#S4.Thmtheorem2)\(LEACE\-Switch\)

###### Proof\.

The sketch for the rest of the proof will look like this:

1. 1\.Find necessary conditions for optimality using Lagrange multipliers method
2. 2\.Show thatA∗,b∗A^\{\*\},b^\{\*\}satisfy the necessary conditions
3. 3\.Show that optimisation problem is convex over linear constraints, and such, if a local solution exists, it is globally optimal and unique\.

Let us formulate the Lagrangian\. HereΛ∈ℝd×k\\Lambda\\in\\mathbb\{R\}^\{d\\times k\}, because we haved⋅kd\\cdot kconstraints on covariance matrix\.

ℒ\(A,b,Λ\)=12E\[∥AX\+b−X∥22\]\+⟨Λ,Cov\(AX\+b,Z\)\+Cov\(X,Z\)⟩F=12E\[\(AX\+b−X\)T\(AX\+b−X\)\]\+Tr\(ΛT\(A\+I\)ΣXZ\)=E\[12XTATAX\+bTAX−XTAX−XTb\+12bTb\+12XTX\]\+Tr\(ΛT\(A\+I\)ΣXZ\)\\mathcal\{L\}\(A,b,\\Lambda\)=\\frac\{1\}\{2\}\\mathrm\{E\}\\Big\[\\lVert\{AX\+b\-X\}\\rVert\_\{2\}^\{2\}\\Big\]\+\\langle\\Lambda,\\mathrm\{Cov\}\(AX\+b,Z\)\+\\mathrm\{Cov\}\(X,Z\)\\rangle\_\{F\}=\\\\ \\frac\{1\}\{2\}\\mathrm\{E\}\\Big\[\(AX\+b\-X\)^\{T\}\(AX\+b\-X\)\\Big\]\+\\mathrm\{Tr\}\\Big\(\\Lambda^\{T\}\(A\+I\)\\Sigma\_\{XZ\}\\Big\)=\\\\ \\mathrm\{E\}\\Big\[\\frac\{1\}\{2\}X^\{T\}A^\{T\}AX\+b^\{T\}AX\-X^\{T\}AX\-X^\{T\}b\+\\frac\{1\}\{2\}b^\{T\}b\+\\frac\{1\}\{2\}X^\{T\}X\\Big\]\+\\mathrm\{Tr\}\\Big\(\\Lambda^\{T\}\(A\+I\)\\Sigma\_\{XZ\}\\Big\)\(28\)
The partial derivatives of the Lagrangian with respect toA,b,ΛA,b,\\Lambdaare

∂ℒ∂A\\displaystyle\\frac\{\\partial\\mathcal\{L\}\}\{\\partial A\}=E\[AXXT\+bXT−XXT\]\+ΛΣXZT,\\displaystyle=\\mathrm\{E\}\[AXX^\{T\}\+bX^\{T\}\-XX^\{T\}\]\+\\Lambda\\Sigma^\{T\}\_\{XZ\}\\,,=AE\[XXT\]\+bE\[X\]T−E\[XXT\]\+ΛΣXZT,\\displaystyle=A\\mathrm\{E\}\[XX^\{T\}\]\+b\\mathrm\{E\}\[X\]^\{T\}\-\\mathrm\{E\}\[XX^\{T\}\]\+\\Lambda\\Sigma^\{T\}\_\{XZ\}\\,,∂ℒ∂b\\displaystyle\\frac\{\\partial\\mathcal\{L\}\}\{\\partial b\}=E\[AX\+b−X\],\\displaystyle=\\mathrm\{E\}\[AX\+b\-X\]\\,,=AE\[X\]\+b−E\[X\],\\displaystyle=A\\mathrm\{E\}\[X\]\+b\-\\mathrm\{E\}\[X\]\\,,∂ℒ∂Λ\\displaystyle\\frac\{\\partial\\mathcal\{L\}\}\{\\partial\\Lambda\}=\(A\+I\)ΣXZ\.\\displaystyle=\(A\+I\)\\Sigma\_\{XZ\}\\,\.
Next, we useμ=E\(X\)\\mu=\\mathrm\{E\}\(X\)andE\[XXT\]=ΣXX\+μμT\\mathrm\{E\}\[XX^\{T\}\]=\\Sigma\_\{XX\}\+\\mu\\mu^\{T\}to formulate the necessary conditions

0=∂ℒ∂A\\displaystyle 0=\\frac\{\\partial\\mathcal\{L\}\}\{\\partial A\}=\(A−I\)\(ΣXX\+μμT\)\+bμT\+ΛΣXZT,\\displaystyle=\(A\-I\)\\Big\(\\Sigma\_\{XX\}\+\\mu\\mu^\{T\}\\Big\)\+b\\mu^\{T\}\+\\Lambda\\Sigma^\{T\}\_\{XZ\}\\,,\(29\)0=∂ℒ∂b\\displaystyle 0=\\frac\{\\partial\\mathcal\{L\}\}\{\\partial b\}=Aμ\+b−μ,\\displaystyle=A\\mu\+b\-\\mu\\,,\(30\)0=∂ℒ∂Λ\\displaystyle 0=\\frac\{\\partial\\mathcal\{L\}\}\{\\partial\\Lambda\}=\(A\+I\)ΣXZ\.\\displaystyle=\(A\+I\)\\Sigma\_\{XZ\}\\,\.\(31\)
We note that the optimalb∗b^\{\*\}as defined in equation[14](https://arxiv.org/html/2605.05220#S4.E14)satisfies[30](https://arxiv.org/html/2605.05220#A4.E30)\. Plugging equation[30](https://arxiv.org/html/2605.05220#A4.E30)in equation[29](https://arxiv.org/html/2605.05220#A4.E29)leads to

\(A−I\)\(ΣXX\+μμT\)\+\(μ−Aμ\)μT\+ΛΣXZT,\\displaystyle\(A\-I\)\\Big\(\\Sigma\_\{XX\}\+\\mu\\mu^\{T\}\\Big\)\+\(\\mu\-A\\mu\)\\mu^\{T\}\+\\Lambda\\Sigma\_\{XZ\}^\{T\}\\,,=\\displaystyle\{\}=\{\}AΣXX−ΣXX\+AμμT−μμT\+μμT−AμμT\+ΛΣXZT,\\displaystyle A\\Sigma\_\{XX\}\-\\Sigma\_\{XX\}\+A\\mu\\mu^\{T\}\-\\mu\\mu^\{T\}\+\\mu\\mu^\{T\}\-A\\mu\\mu^\{T\}\+\\Lambda\\Sigma\_\{XZ\}^\{T\}\\,,=\\displaystyle\{\}=\{\}\(A−I\)ΣXX\+ΛΣXZT=0\.\\displaystyle\(A\-I\)\\Sigma\_\{XX\}\+\\Lambda\\Sigma\_\{XZ\}^\{T\}=0\\,\.\(32\)
Now let us check thatA∗A^\{\*\}satisfies[31](https://arxiv.org/html/2605.05220#A4.E31)and[32](https://arxiv.org/html/2605.05220#A4.E32)\. By pluggingA∗A^\{\*\}into[31](https://arxiv.org/html/2605.05220#A4.E31)we get

0=\\displaystyle 0\{\}=\{\}\(2I−2W\+\(WΣXZ\)\(WΣXZ\)\+W\)ΣXZ,\\displaystyle\(2I\-2W^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}W\)\\Sigma\_\{XZ\}\\,,=\\displaystyle\{\}=\{\}2ΣXZ−2W\+\(WΣXZ\)\(WΣXZ\)\+\(WΣXZ\),\\displaystyle 2\\Sigma\_\{XZ\}\-2W^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}\(W\\Sigma\_\{XZ\}\)\\,,=\\displaystyle\{\}=\{\}2\(ΣXZ−W\+WΣXZ\),\\displaystyle 2\\Big\(\\Sigma\_\{XZ\}\-W^\{\+\}W\\Sigma\_\{XZ\}\\Big\)\\,,=\\displaystyle\{\}=\{\}2\(ΣXZ−\(I−P𝒩\(W\)\)ΣXZ\),\\displaystyle 2\\left\(\\Sigma\_\{XZ\}\-\\left\(I\-P\_\{\\mathcal\{N\}\(W\)\}\\right\)\\Sigma\_\{XZ\}\\right\)\\,,=\\displaystyle\{\}=\{\}2P𝒩\(W\)ΣXZ,\\displaystyle 2P\_\{\\mathcal\{N\}\(W\)\}\\Sigma\_\{XZ\}\\,,\(33\)because Moore\-Penrose inversesB\+B^\{\+\}ofBBsatisfyBB\+B=BBB^\{\+\}B=BandB\+B=I−PℬB^\{\+\}B=I\-P\_\{\\mathcal\{B\}\}\. HerePℬP\_\{\\mathcal\{B\}\}denotes the orthogonal projection onto the nullspace𝒩\(B\)\\mathcal\{N\}\(B\)ofBB\. Since the columns ofΣXZ\\Sigma\_\{XZ\}always lie within the image ofΣXX\\Sigma\_\{XX\}\(which is the orthogonal complement of the kernel ofΣXX\\Sigma\_\{XX\}, which is also the kernel ofWW\), we can conclude that equation[33](https://arxiv.org/html/2605.05220#A4.E33)is always satisfied\.

PluggingA∗A^\{\*\}into[32](https://arxiv.org/html/2605.05220#A4.E32)we observe

−2⋅\(W\+\(WΣXZ\)\(WΣXZ\)\+W\)ΣXX\+ΛΣXZT=−2⋅W\+\(WΣXZ\)\(WΣXZ\)\+W\+\+ΛΣXZT=0\-2\\cdot\(W^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}W\)\\Sigma\_\{XX\}\+\\Lambda\\Sigma\_\{XZ\}^\{T\}=\\\\ \-2\\cdot W^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}W^\{\+\}\+\\Lambda\\Sigma\_\{XZ\}^\{T\}=0\(34\)
The identityWΣXX=W\+W\\Sigma\_\{XX\}=W^\{\+\}holds becauseΣXX\\Sigma\_\{XX\}is symmetric p\.s\.d\., soΣXX=UDUT\\Sigma\_\{XX\}=UDU^\{T\}andΣXX−1/2ΣXX=UD−1/2UTUDUT=UD1/2UT=ΣXX1/2\\Sigma\_\{XX\}^\{\-1/2\}\\Sigma\_\{XX\}=UD^\{\-1/2\}U^\{T\}UDU^\{T\}=UD^\{1/2\}U^\{T\}=\\Sigma\_\{XX\}^\{1/2\}for some orthogonalUUand non\-negative diagonalDD, and becauseD−1/2D^\{\-1/2\}ignores zero diagonal values\.

Next, multiplying equation[34](https://arxiv.org/html/2605.05220#A4.E34)byWWfrom both sides leads to

−2WW\+\(WΣXZ\)\(WΣXZ\)\+W\+W\+WΛΣXZTW=−2\(WΣXZ\)\(WΣXZ\)\+\+WΛ\(WΣXZ\)T=−2ΣWX,ZΣWX,Z\+\+ΛWΣWX,ZT=0,\-2WW^\{\+\}\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}W^\{\+\}W\+W\\Lambda\\Sigma\_\{XZ\}^\{T\}W=\\\\ \-2\(W\\Sigma\_\{XZ\}\)\(W\\Sigma\_\{XZ\}\)^\{\+\}\+W\\Lambda\(W\\Sigma\_\{XZ\}\)^\{T\}=\-2\\Sigma\_\{WX,Z\}\\Sigma\_\{WX,Z\}^\{\+\}\+\\Lambda\_\{W\}\\Sigma\_\{WX,Z\}^\{T\}=0,\(35\)where againWW\+=W\+W=IWW^\{\+\}=W^\{\+\}W=Ion a subspace covered byXX, and thus, almost surely\.

We seekΛW\\Lambda\_\{W\}such that−2ΣWX,ZΣWX,Z\+\+ΛWΣWX,ZT=0\-2\\Sigma\_\{WX,Z\}\\Sigma\_\{WX,Z\}^\{\+\}\+\\Lambda\_\{W\}\\Sigma\_\{WX,Z\}^\{T\}=0\. LetΛW=2\(ΣWX,Z\+\)T\\Lambda\_\{W\}=2\(\\Sigma\_\{WX,Z\}^\{\+\}\)^\{T\}\. This choice satisfies the condition becauseΣWX,ZΣWX,Z\+\\Sigma\_\{WX,Z\}\\Sigma\_\{WX,Z\}^\{\+\}is an orthogonal projection matrix, and is thus symmetric, so

\(ΣWX,Z\+\)TΣWX,ZT=\(ΣWX,ZΣWX,Z\+\)T=ΣWX,ZΣWX,Z\+,\(\\Sigma\_\{WX,Z\}^\{\+\}\)^\{T\}\\Sigma\_\{WX,Z\}^\{T\}=\(\\Sigma\_\{WX,Z\}\\Sigma\_\{WX,Z\}^\{\+\}\)^\{T\}=\\Sigma\_\{WX,Z\}\\Sigma\_\{WX,Z\}^\{\+\},which again proves the Lagrange conditions for partial derivative w\.r\.t\. A\.

Thus we have shown that the said optimisation problem has a local solution\. But because the constraint is linear inAA, and it follows from the triangle inequality that∥⋅∥\\lVert\{\\cdot\}\\rVertis convex, the local optimum is actually the global minimum\.

∎

### D\.3Proof of Thm\.[4\.3](https://arxiv.org/html/2605.05220#S4.Thmtheorem3)\(vanilla concept switching is a special case of LEACE\-Switch\)

###### Proof\.

Letk=1,Z=C,M=Ik=1,Z=C,M=I\. According toTheorem[4\.2](https://arxiv.org/html/2605.05220#S4.Thmtheorem2),f\(X\)=A∗X\+b∗f\(X\)=A^\{\*\}X\+b^\{\*\}, whereA∗A^\{\*\}is defined in equation[13](https://arxiv.org/html/2605.05220#S4.E13)andb∗b^\{\*\}is defined in equation[14](https://arxiv.org/html/2605.05220#S4.E14)minimizes equation[11](https://arxiv.org/html/2605.05220#S4.E11)\.

b=0b=0sinceE\[X\]=0\\mathrm\{E\}\[X\]=0\. Also it can be shownW=ΣXX−1/2=I−1/2=IW=\\Sigma\_\{XX\}^\{\-1/2\}=I^\{\-1/2\}=I, so the transform becomes:

f\(X\)=\(I−2⋅ΣXZΣXZ\+\)Xf\(X\)=\\Big\(I\-2\\cdot\\Sigma\_\{XZ\}\\Sigma\_\{XZ\}^\{\+\}\\Big\)X\(36\)
Recall that we are working ink=1k=1, soΣXZ∈ℝd×1\\Sigma\_\{XZ\}\\in\\mathbb\{R\}^\{d\\times 1\}is a column\-vector\. So

ΣXZ=Cov\(X,Z\)=E\[XZ\]−E\[X\]⋅E\[Z\]=E\[X⋅1\|Z=1\]⋅P\(Z=1\)\+E\[X⋅0\|Z=0\]⋅P\(Z=0\)−E\[X\]⋅P\(Z=1\)=P\(Z=1\)⋅\(E\[X\|Z=1\]−E\[X\]\)\\Sigma\_\{XZ\}=\\mathrm\{Cov\}\(X,Z\)=\\mathrm\{E\}\[XZ\]\-\\mathrm\{E\}\[X\]\\cdot\\mathrm\{E\}\[Z\]=\\mathrm\{E\}\[X\\cdot 1\|Z=1\]\\cdot P\(Z=1\)\+\\\\ \\mathrm\{E\}\[X\\cdot 0\|Z=0\]\\cdot P\(Z=0\)\-\\mathrm\{E\}\[X\]\\cdot P\(Z=1\)=\\\\ P\(Z=1\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\]\\Big\)
Now recall thatP\(Z=1\)\+P\(Z=0\)=1P\(Z=1\)\+P\(Z=0\)=1, so

ΣXZ=P\(Z=1\)⋅\(E\[X\|Z=1\]−E\[X\]\)=P\(Z=1\)⋅\(E\[X\|Z=1\]−E\[X\|Z=1\]P\(Z=1\)−E\[X\|Z=0\]P\(Z=0\)\)=P\(Z=1\)⋅P\(Z=0\)⋅\(E\[X\|Z=1\]−E\[X\|Z=0\]\)\\Sigma\_\{XZ\}=P\(Z=1\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\]\\Big\)=\\\\ P\(Z=1\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\|Z=1\]P\(Z=1\)\-\\mathrm\{E\}\[X\|Z=0\]P\(Z=0\)\\Big\)=\\\\ P\(Z=1\)\\cdot P\(Z=0\)\\cdot\\Big\(\\mathrm\{E\}\[X\|Z=1\]\-\\mathrm\{E\}\[X\|Z=0\]\\Big\)
, which is equal tossup to normalization constant\. By definition of Moore\-Penrose inverse for column\-vectors,

ΣXZ\+=ΣXZT∥ΣXZ∥2\\Sigma\_\{XZ\}^\{\+\}=\\frac\{\\Sigma\_\{XZ\}^\{T\}\}\{\\lVert\{\\Sigma\_\{XZ\}\}\\rVert^\{2\}\}, so

f\(X\)=X−2⋅ssTX=X−2⋅s\(sTX\)=X−2⋅\(sTX\)s=X−2⋅⟨X,s⟩sf\(X\)=X\-2\\cdot ss^\{T\}X=X\-2\\cdot s\(s^\{T\}X\)=X\-2\\cdot\(s^\{T\}X\)s=X\-2\\cdot\\langle X,s\\rangle s
, soffis equivalent tofswitchf\_\{switch\}\.

∎

### D\.4Proof of Thm\.[4\.4](https://arxiv.org/html/2605.05220#S4.Thmtheorem4)\(MidSteer\)

###### Proof\.

We will use the same method as in[D\.2](https://arxiv.org/html/2605.05220#A4.SS2)to prove this\. Indeed, the objective is same, and thus convex\. The constraint is still linear:

Cov\(Ax\+b,Z1\)=AΣXZ1=ΣXZ2=Cov\(X,Z2\)\\mathrm\{Cov\}\(Ax\+b,Z\_\{1\}\)=A\\Sigma\_\{XZ\_\{1\}\}=\\Sigma\_\{XZ\_\{2\}\}=\\mathrm\{Cov\}\(X,Z\_\{2\}\)\(37\)
So let us define the Lagrangian, whereΛ∈ℝd×l\\Lambda\\in\\mathbb\{R\}^\{d\\times l\}:

ℒ\(A,b,Λ\)=12E\[\(AX\+b−X\)T\(AX\+b−X\)\]\+Tr\(ΛT\(AΣXZ1−ΣXZ2\)\)\\mathcal\{L\}\(A,b,\\Lambda\)=\\frac\{1\}\{2\}\\mathrm\{E\}\\Big\[\(AX\+b\-X\)^\{T\}\(AX\+b\-X\)\\Big\]\+Tr\\Big\(\\Lambda^\{T\}\(A\\Sigma\_\{XZ\_\{1\}\}\-\\Sigma\_\{XZ\_\{2\}\}\)\\Big\)\(38\)
The derivatives w\.r\.t\. parameters are the following:

∂ℒ∂A\\displaystyle\\frac\{\\partial\\mathcal\{L\}\}\{\\partial A\}=\(A−I\)\(ΣXX\+μμT\)\+bμT\+ΛΣXZ1T\\displaystyle=\(A\-I\)\(\\Sigma\_\{XX\}\+\\mu\\mu^\{T\}\)\+b\\mu^\{T\}\+\\Lambda\\Sigma\_\{XZ\_\{1\}\}^\{T\}=0\\displaystyle=0\(39\)∂ℒ∂b\\displaystyle\\frac\{\\partial\\mathcal\{L\}\}\{\\partial b\}=Aμ−μ\+b\\displaystyle=A\\mu\-\\mu\+b=0\\displaystyle=0\(40\)∂ℒ∂Λ\\displaystyle\\frac\{\\partial\\mathcal\{L\}\}\{\\partial\\Lambda\}=AΣXZ1−ΣXZ2\\displaystyle=A\\Sigma\_\{XZ\_\{1\}\}\-\\Sigma\_\{XZ\_\{2\}\}=0\\displaystyle=0\(41\)
Triviallyb∗b^\{\*\}satisfies equation[40](https://arxiv.org/html/2605.05220#A4.E40)for suitableA∗A^\{\*\}\.

Let us see that equation[41](https://arxiv.org/html/2605.05220#A4.E41)is satisfied\. We can plugA∗A^\{\*\}and then multiply byWWon the left, to get:

WΣXZ1\+WW\+\(ΣWX,Z2−ΣWX,Z1\)ΣWX,Z1\+WΣXZ1−WΣXZ2=ΣWX,Z1−ΣWX,Z1ΣWX,Z1\+ΣWX,Z1\+ΣWX,Z2ΣWX,Z1\+ΣWX,Z1−ΣWX,Z2=ΣWX,Z2Il−ΣWX,Z2=0W\\Sigma\_\{XZ\_\{1\}\}\+WW^\{\+\}\(\\Sigma\_\{WX,Z\_\{2\}\}\-\\Sigma\_\{WX,Z\_\{1\}\}\)\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}W\\Sigma\_\{XZ\_\{1\}\}\-W\\Sigma\_\{XZ\_\{2\}\}=\\\\ \\Sigma\_\{WX,Z\_\{1\}\}\-\\Sigma\_\{WX,Z\_\{1\}\}\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}\\Sigma\_\{WX,Z\_\{1\}\}\+\\Sigma\_\{WX,Z\_\{2\}\}\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}\\Sigma\_\{WX,Z\_\{1\}\}\-\\Sigma\_\{WX,Z\_\{2\}\}=\\\\ \\Sigma\_\{WX,Z\_\{2\}\}I\_\{l\}\-\\Sigma\_\{WX,Z\_\{2\}\}=0Here we usedYY\+Y=YYY^\{\+\}Y=Yfor anyYY\. Furthermore, sinceΣWX,Z1\\Sigma\_\{WX,Z\_\{1\}\}has linearly independent columns \(full column rank\), we use the propertyY\+Y=IY^\{\+\}Y=I\.

Next, plugging equation[40](https://arxiv.org/html/2605.05220#A4.E40)into equation[39](https://arxiv.org/html/2605.05220#A4.E39)we get:

\(A−I\)ΣXX\+ΛΣXZ1T=0\(A\-I\)\\Sigma\_\{XX\}\+\\Lambda\\Sigma\_\{XZ\_\{1\}\}^\{T\}=0\(42\)
Let us now proceed to show that forA∗A^\{\*\}there existsΛ∈ℝd×l\\Lambda\\in\\mathbb\{R\}^\{d\\times l\}so this equality holds\. After plugging inA∗A^\{\*\}and using previously shown factWΣXX=W\+W\\Sigma\_\{XX\}=W^\{\+\}:

W\+\(ΣWX,Z2−ΣWX,Z1\)ΣWX,Z1\+W\+\+ΛΣXZ1T=0W^\{\+\}\(\\Sigma\_\{WX,Z\_\{2\}\}\-\\Sigma\_\{WX,Z\_\{1\}\}\)\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}W^\{\+\}\+\\Lambda\\Sigma\_\{XZ\_\{1\}\}^\{T\}=0\(43\)
Again, multiplying byWWon both sides and recalling thatWWis symmetric we get:

\(ΣWX,Z2−ΣWX,Z1\)ΣWX,Z1\+\+ΛWΣWX,Z1T=0\(\\Sigma\_\{WX,Z\_\{2\}\}\-\\Sigma\_\{WX,Z\_\{1\}\}\)\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}\+\\Lambda\_\{W\}\\Sigma\_\{WX,Z\_\{1\}\}^\{T\}=0\(almost surely\)
Now,ΣWX,Z1\\Sigma\_\{WX,Z\_\{1\}\}is also full column rank, soΣWX,Z1\+=\(ΣWX,Z1TΣWX,Z1\)−1ΣWX,Z1T\\Sigma\_\{WX,Z\_\{1\}\}^\{\+\}=\\Big\(\\Sigma\_\{WX,Z\_\{1\}\}^\{T\}\\Sigma\_\{WX,Z\_\{1\}\}\\Big\)^\{\-1\}\\Sigma\_\{WX,Z\_\{1\}\}^\{T\}\. Thus,ΛW=−\(ΣWX,Z2−ΣWX,Z1\)\(ΣWX,Z1TΣWX,Z1\)−1\\Lambda\_\{W\}=\-\\Big\(\\Sigma\_\{WX,Z\_\{2\}\}\-\\Sigma\_\{WX,Z\_\{1\}\}\\Big\)\\Big\(\\Sigma\_\{WX,Z\_\{1\}\}^\{T\}\\Sigma\_\{WX,Z\_\{1\}\}\\Big\)^\{\-1\}satisfies the equation\.

∎

## Appendix EExample LLM prompts for Estimating Class\-Conditional Covariance

To obtain class\-conditional means for LLMs, we constructed 1000 prompts across several categories\. In each case, last token activation was used to compute steering vectors, before first token of model output and after corresponding chat template tokens\. Below we provide illustrative subsets of the prompts used:

- •Horse\-related prompts - –How did horses evolve from their early ancestors? - –What factors contributed to the domestication of horses? - –What anatomical features enable a horse to run at high speeds?
- •Motorcycle\-related prompts - –What was the significance of the earliest gasoline powered motorcycle in the evolution of personal transportation? - –How does the engine cycle mechanism influence power delivery and efficiency in a motorbike? - –What are the primary considerations when selecting the appropriate tire for varying riding conditions?
- •Dog\-related prompts - –What is believed to be the origin of dog domestication? - –How do dogs communicate through body language? - –Which genetic factors influence coat color in dogs?
- •Cat\-related prompts - –What is the origin of domestic cats and how did they become companion animals? - –How did ancient Egyptians incorporate cats into their society and religion? - –Why did cats spread globally alongside human trade routes?

## Appendix FPrompt for Concept Scoring in LLMs

To assess the quality of steering, we compute a*concept score*that gauges the presence of a specific concept in generated text\. The score ranges from 0 to 10:

- •A score of 0 indicates the text is completely irrelevant to the concept\.
- •A score of 10 indicates strong and explicit relevance to the concept\.
- •Scores from 1 through 9 represent gradations of partial relevance\.

The following prompt was used to query the model for concept scoring:

```
Score the text on a scale of 0 to 10, where the score reflects the extent
to which it demonstrates relevance to the concept of {concept}. A score of 0
indicates the text is completely irrelevant and shows no connection to {concept},
whereas a score of 10 signifies strong and explicit relevance to {concept}. For
texts that suggest varying degrees of relevance, use the intermediate numbers
(1 through 9) to represent the gradation of connection to {concept}.
Output just a single number.

Text to evaluate: ‘‘‘
{text}
‘‘‘
```

The exact score is obtained by doing a forward pass of Llama3\.1\-7B model and taking argmax across token probabilities corresponding to score tokens \(from 0 to 10\)\.

## Appendix GTemplate Prompts for Steering Quality Assessment

To evaluate the effectiveness of steering, we constructed a set of template prompts\. In each case, the\{\}placeholder was substituted with the concept under consideration \(as defined in[5](https://arxiv.org/html/2605.05220#S5)\)\. Each template was used to generate multiple model responses\. These responses were then evaluated either by a judge model \(for text/LLM outputs\) or by CLIP score \(for images\), to compute the Concept Score as well as related metrics such as FID and BERTScore\.

### G\.1LLM Assessment Prompts

The following template prompts were used for large language model \(LLM\) evaluation\. The placeholder\{\}is substituted with a concept\.

```
1.  Write a short story about a {}
 2.  Write a poem about a {}
 3.  What is the history of {}
 4.  What is the most famous {}?
 5.  What is the most expensive {}?
 6.  How to make a {}?
 7.  How to maintain a {}?
 8.  How to dispose of a {}?
 9.  How to store a {}?
10.  How to transport a {}?
11.  How to use a {}?
12.  What is important to know about {}?
13.  How to tell age of a {}?
14.  What is a {}?
15.  What types of {} are there?
16.  What are the most common {}?
17.  Describe an appearance of {} in detail.
18.  How does {} look like?
19.  How does {} sound like?
20.  How does {} feel like?
21.  How does {} behave like?
22.  What is the purpose of {}?
23.  What are the main components of a {}?
24.  How to identify a {}?
25.  Where can you find a {}?
26.  What are the dangers of a {}?
27.  What tools do you need for a {}?
28.  How much does a {} typically cost?
29.  What are alternatives to a {}?
30.  How to choose a good {}?
31.  What are common problems with a {}?
32.  How long does a {} typically last?
33.  What size is a typical {}?
34.  How to clean a {}?
35.  What skills are needed to handle a {}?
36.  What are the benefits of having a {}?
37.  How has {} changed over time?
38.  What cultures use {} the most?
39.  How to test if a {} is working properly?
40.  What safety precautions are needed for a {}?
41.  How to upgrade or improve a {}?
42.  How does weather affect a {}?
43.  What are the environmental impacts of a {}?
44.  How to measure the quality of a {}?
45.  What accessories go with a {}?
46.  How to protect a {} from damage?
47.  What are myths about {}?
48.  How to teach someone about a {}?
49.  What industries use {}?
50.  How is a {} different from similar things?
51.  What are the legal considerations for owning a {}?
52.  How to pack a {} for moving?
53.  What are seasonal considerations for a {}?
54.  How to customize a {}?
55.  What are expert tips for using a {}?
56.  How to troubleshoot issues with a {}?
57.  What is the lifecycle of a {}?
58.  How to estimate the value of a {}?
59.  What are cultural significances of a {}?
60.  How to take a picture of a {}?
61.  How to make a sculpture of a {}?
62.  What is the future of {}?
63.  How to draw a {}?
64.  When was {} first mentioned in human history?
65.  Can one ride a {}?
66.  Write a song about {}
67.  Define a {}
68.  Write a positive review on a book about {}
69.  Write a negative review on a book about {}
70.  Do people make toys of {}?
71.  How is {} used in the economy?
72.  Write an abstract for a science paper about {}
73.  How does temperature affect a {}?
74.  What are the origins of the word {}?
75.  What are superstitions about {}?
76.  How to simulate a {} digitally?
77.  What are the physics of a {}?
78.  How to teach children about {}?
79.  What are famous artworks featuring {}?
80.  What are the nutritional aspects of a {}?
81.  Describe the most famous {} competitions.
```

### G\.2Image Assessment Prompts

The following template prompts were used for image model evaluation\. The placeholder\{\}was substituted with a concept\.

```
1.  a bad photo of a {}.
 2.  a photo of many {}.
 3.  a sculpture of a {}.
 4.  a photo of the hard to see {}.
 5.  a low resolution photo of the {}.
 6.  a rendering of a {}.
 7.  graffiti of a {}.
 8.  a bad photo of the {}.
 9.  a cropped photo of the {}.
10.  a tattoo of a {}.
11.  the embroidered {}.
12.  a photo of a hard to see {}.
13.  a bright photo of a {}.
14.  a photo of a clean {}.
15.  a photo of a dirty {}.
16.  a dark photo of the {}.
17.  a drawing of a {}.
18.  a photo of my {}.
19.  the plastic {}.
20.  a photo of the cool {}.
21.  a close-up photo of a {}.
22.  a black and white photo of the {}.
23.  a painting of the {}.
24.  a painting of a {}.
25.  a pixelated photo of the {}.
26.  a sculpture of the {}.
27.  a bright photo of the {}.
28.  a cropped photo of a {}.
29.  a plastic {}.
30.  a photo of the dirty {}.
31.  a jpeg corrupted photo of a {}.
32.  a blurry photo of the {}.
33.  a photo of the {}.
34.  a good photo of the {}.
35.  a rendering of the {}.
36.  a {} in a video game.
37.  a photo of one {}.
38.  a doodle of a {}.
39.  a close-up photo of the {}.
40.  a photo of a {}.
41.  the origami {}.
42.  the {} in a video game.
43.  a sketch of a {}.
44.  a doodle of the {}.
45.  a origami {}.
46.  a low resolution photo of a {}.
47.  the toy {}.
48.  a rendition of the {}.
49.  a photo of the clean {}.
50.  a photo of a large {}.
51.  a rendition of a {}.
52.  a photo of a nice {}.
53.  a photo of a weird {}.
54.  a blurry photo of a {}.
55.  a cartoon {}.
56.  art of a {}.
57.  a sketch of the {}.
58.  a embroidered {}.
59.  a pixelated photo of a {}.
60.  itap of the {}.
61.  a jpeg corrupted photo of the {}.
62.  a good photo of a {}.
63.  a plushie {}.
64.  a photo of the nice {}.
65.  a photo of the small {}.
66.  a photo of the weird {}.
67.  the cartoon {}.
68.  art of the {}.
69.  a drawing of the {}.
70.  a photo of the large {}.
71.  a black and white photo of a {}.
72.  the plushie {}.
73.  a dark photo of a {}.
74.  itap of a {}.
75.  graffiti of the {}.
76.  a toy {}.
77.  itap of my {}.
78.  a photo of a cool {}.
79.  a photo of a small {}.
80.  a tattoo of the {}.
```

## Appendix HNumber of prompts for covariance calculation

To find the optimal number of prompts used to calculate unconditional covariancesΣXX\\Sigma\_\{XX\}for concept switching, we perform the following ablation study\. For each number of prompts used for covariances generation from the set\{100,500,1000,5000,10000,20000\}\\\{100,500,1000,5000,10000,20000\\\}we run the same base experiment as outlined in Sec\.[5](https://arxiv.org/html/2605.05220#S5)\. We then computeΔCS\\Delta CSand 1 \- BERT Precision @ MMLU metrics on a small set of MidSteer steering strengths \(to show if the steering strength can affect the optimal number of prompts\)\. We do it on LLM experiment setup \(sec\.[2](https://arxiv.org/html/2605.05220#S5.F2)\) and use LLama2\-7B model\.

We then plot these values of a 2D plane similar to Pareto charts, but this time varying the number of prompts instead\. This in essence forms a curve that, after a certain threshold, settles in a small region of metric space\. As can be seen from the chart below, increasing the number of prompts used beyond 5000 has limited impact\.

![[Uncaptioned image]](https://arxiv.org/html/2605.05220v1/x5.png)

## Appendix IMore results on concept switching

#### I\.0\.1Pareto charts for LLM concept switching

In this section, we provide more Pareto charts for LLM concept switching\. When switching conceptcsc\_\{s\}toctc\_\{t\}, for each LLM model we provide 9 types of Pareto plots:

- •1 \- BERT Precision score for unrelated concepts \(horizontal axis\) vsΔ\\DeltaConcept Score \(CS\) for the targetctc\_\{t\}and sourcecsc\_\{s\}concepts \(fig\.[11\(b\)](https://arxiv.org/html/2605.05220#A9.F11.sf2),[12\(b\)](https://arxiv.org/html/2605.05220#A9.F12.sf2),[13\(b\)](https://arxiv.org/html/2605.05220#A9.F13.sf2)\)
- •1 \- BERT Precision score for MMLU \(horizontal axis\) vsΔ\\DeltaConcept Score \(CS\) for the targetctc\_\{t\}and sourcecsc\_\{s\}concepts \(vertical axis\) \(fig\.[11\(a\)](https://arxiv.org/html/2605.05220#A9.F11.sf1),[12\(a\)](https://arxiv.org/html/2605.05220#A9.F12.sf1),[13\(a\)](https://arxiv.org/html/2605.05220#A9.F13.sf1)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vsΔ\\DeltaConcept Score \(CS\) for the targetctc\_\{t\}and sourcecsc\_\{s\}concepts \(vertical axis\) \(fig\.[11\(c\)](https://arxiv.org/html/2605.05220#A9.F11.sf3),[12\(c\)](https://arxiv.org/html/2605.05220#A9.F12.sf3),[13\(c\)](https://arxiv.org/html/2605.05220#A9.F13.sf3)\)
- •1 \- BERT Precision score for unrelated concepts \(horizontal axis\) vs Concept Score \(CS\) for thesourcecsc\_\{s\}concept \(vertical axis\) \(fig\.[5\(b\)](https://arxiv.org/html/2605.05220#A9.F5.sf2),[6\(b\)](https://arxiv.org/html/2605.05220#A9.F6.sf2),[7\(b\)](https://arxiv.org/html/2605.05220#A9.F7.sf2)\)
- •BERT Precision score for MMLU \(horizontal axis\) vs Concept Score \(CS\) for thesourcecsc\_\{s\}concept \(vertical axis\) \(fig\.[5\(a\)](https://arxiv.org/html/2605.05220#A9.F5.sf1),[6\(a\)](https://arxiv.org/html/2605.05220#A9.F6.sf1),[7\(a\)](https://arxiv.org/html/2605.05220#A9.F7.sf1)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vs Concept Score \(CS\) for thesourcecsc\_\{s\}concept \(vertical axis\) \(fig\.[5\(c\)](https://arxiv.org/html/2605.05220#A9.F5.sf3),[6\(c\)](https://arxiv.org/html/2605.05220#A9.F6.sf3),[7\(c\)](https://arxiv.org/html/2605.05220#A9.F7.sf3)\)
- •1 \- BERT Precision score for unrelated concepts \(horizontal axis\) vs Concept Score \(CS\) for thetargetcsc\_\{s\}concept \(vertical axis\) \(fig\.[5\(b\)](https://arxiv.org/html/2605.05220#A9.F5.sf2),[9\(b\)](https://arxiv.org/html/2605.05220#A9.F9.sf2),[7\(b\)](https://arxiv.org/html/2605.05220#A9.F7.sf2)\)
- •BERT Precision score for MMLU \(horizontal axis\) vs Concept Score \(CS\) for thetargetcsc\_\{s\}concept \(vertical axis\) \(fig\.[5\(a\)](https://arxiv.org/html/2605.05220#A9.F5.sf1),[6\(a\)](https://arxiv.org/html/2605.05220#A9.F6.sf1),[7\(a\)](https://arxiv.org/html/2605.05220#A9.F7.sf1)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vs Concept Score \(CS\) for thetargetcsc\_\{s\}concept \(vertical axis\) \(fig\.[5\(c\)](https://arxiv.org/html/2605.05220#A9.F5.sf3),[6\(c\)](https://arxiv.org/html/2605.05220#A9.F6.sf3),[7\(c\)](https://arxiv.org/html/2605.05220#A9.F7.sf3)\)

In each case, we see clear superiority of MidSteer over other steering approaches\.

We additionally provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,ct,cic\_\{s\},c\_\{t\},c\_\{i\}in the tables in sec\.[I\.2](https://arxiv.org/html/2605.05220#A9.SS2)

![Refer to caption](https://arxiv.org/html/2605.05220v1/x6.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x7.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x8.png)\(c\)Unrelated vs CS

Figure 5:Pareto plot for concept flip on model llama2\-7b \(Source\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x9.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x10.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x11.png)\(c\)Unrelated vs CS

Figure 6:Pareto plot for concept flip on model qwen\-14b \(Source\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x12.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x13.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x14.png)\(c\)Unrelated vs CS

Figure 7:Pareto plot for concept flip on model qwen\-7b \(Source\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x15.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x16.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x17.png)\(c\)Unrelated vs CS

Figure 8:Pareto plot for concept flip on model llama2\-7b \(Target\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x18.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x19.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x20.png)\(c\)Unrelated vs CS

Figure 9:Pareto plot for concept flip on model qwen\-14b \(Target\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x21.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x22.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x23.png)\(c\)Unrelated vs CS

Figure 10:Pareto plot for concept flip on model qwen\-7b \(Target\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x24.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x25.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x26.png)\(c\)Unrelated vs CS

Figure 11:Pareto plot for concept flip on model llama2\-7b \(Other axes axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x27.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x28.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x29.png)\(c\)Unrelated vs CS

Figure 12:Pareto plot for concept flip on model qwen\-14b \(Other axes axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x30.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x31.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x32.png)\(c\)Unrelated vs CS

Figure 13:Pareto plot for concept flip on model qwen\-7b \(Other axes axes\)
### I\.1Pareto charts for image diffusion concept switching

In this section, we provide more Pareto charts for Diffusion Models concept switching\. When switching conceptcsc\_\{s\}toctc\_\{t\}, for each LLM model we provide 9 types of Pareto plots:

- •FID score for unrelated concepts \(horizontal axis\) vsΔ\\DeltaConcept Score \(CS\) for the targetctc\_\{t\}and sourcecsc\_\{s\}concepts \(fig\.[18\(b\)](https://arxiv.org/html/2605.05220#A9.F18.sf2),[19\(b\)](https://arxiv.org/html/2605.05220#A9.F19.sf2)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vsΔ\\DeltaConcept Score \(CS\) for the targetctc\_\{t\}and sourcecsc\_\{s\}concepts \(vertical axis\) \(fig\.[18\(a\)](https://arxiv.org/html/2605.05220#A9.F18.sf1),[19\(a\)](https://arxiv.org/html/2605.05220#A9.F19.sf1)\)
- •FID score for unrelated concepts \(horizontal axis\) vs Concept Score \(CS\) for thesourcecsc\_\{s\}concept \(vertical axis\) \(fig\.[14\(b\)](https://arxiv.org/html/2605.05220#A9.F14.sf2),[15\(b\)](https://arxiv.org/html/2605.05220#A9.F15.sf2)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vs Concept Score \(CS\) for thesourcecsc\_\{s\}concept \(vertical axis\) \(fig\.[14\(a\)](https://arxiv.org/html/2605.05220#A9.F14.sf1),[15\(a\)](https://arxiv.org/html/2605.05220#A9.F15.sf1)\)
- •FID score for unrelated concepts \(horizontal axis\) vs Concept Score \(CS\) for thetargetcsc\_\{s\}concept \(vertical axis\) \(fig\.[16\(b\)](https://arxiv.org/html/2605.05220#A9.F16.sf2),[17\(b\)](https://arxiv.org/html/2605.05220#A9.F17.sf2)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vs Concept Score \(CS\) for thetargetcsc\_\{s\}concept \(vertical axis\) \(fig\.[16\(a\)](https://arxiv.org/html/2605.05220#A9.F16.sf1),[17\(a\)](https://arxiv.org/html/2605.05220#A9.F17.sf1)\)

In each case, we see clear superiority of MidSteer over other steering approaches\.

We additionally provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,ct,cic\_\{s\},c\_\{t\},c\_\{i\}in the tables in sec\.[I\.3](https://arxiv.org/html/2605.05220#A9.SS3)

![Refer to caption](https://arxiv.org/html/2605.05220v1/x33.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x34.png)\(b\)Unrelated vs FID

Figure 14:Pareto plot for concept flip on model SANA \(Source\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x35.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x36.png)\(b\)Unrelated vs FID

Figure 15:Pareto plot for concept flip on model SDXL \(Source\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x37.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x38.png)\(b\)Unrelated vs FID

Figure 16:Pareto plot for concept flip on model SANA \(Target\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x39.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x40.png)\(b\)Unrelated vs FID

Figure 17:Pareto plot for concept flip on model SDXL \(Target\-CS axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x41.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x42.png)\(b\)Unrelated vs FID

Figure 18:Pareto plot for concept flip on model SANA \(Other axes axes\)![Refer to caption](https://arxiv.org/html/2605.05220v1/x43.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x44.png)\(b\)Unrelated vs FID

Figure 19:Pareto plot for concept flip on model SDXL \(Other axes axes\)
### I\.2Detailed results for LLM concept switching

In this section, in tab\.[2](https://arxiv.org/html/2605.05220#A9.T2),[3](https://arxiv.org/html/2605.05220#A9.T3),[6](https://arxiv.org/html/2605.05220#A9.T6),[7](https://arxiv.org/html/2605.05220#A9.T7),[4](https://arxiv.org/html/2605.05220#A9.T4),[5](https://arxiv.org/html/2605.05220#A9.T5)we provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,ct,cic\_\{s\},c\_\{t\},c\_\{i\}\. Pareto plots in sec\.[I\.0\.1](https://arxiv.org/html/2605.05220#A9.SS0.SSS1)were created based on the scores provided in these tables\.

Table 2:Model LLama2\-7b, flipping from dogs to catsTable 3:Model LLama2\-7b, flipping from horses to motorcyclesTable 4:Model Qwen2\.5\-7b, flipping from dogs to catsTable 5:Model Qwen2\.5\-7b, flipping from horses to motorcyclesTable 6:Model Qwen2\.5\-14b, flipping from dogs to catsTable 7:Model Qwen2\.5\-14b, flipping from horses to motorcycles
### I\.3Detailed results for for Diffusion Models concept switching

In this section, in tab\.[11](https://arxiv.org/html/2605.05220#A9.T11),[12](https://arxiv.org/html/2605.05220#A9.T12),[8](https://arxiv.org/html/2605.05220#A9.T8),[9](https://arxiv.org/html/2605.05220#A9.T9)we provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,ct,cic\_\{s\},c\_\{t\},c\_\{i\}\. Pareto plots in sec\.[I\.0\.1](https://arxiv.org/html/2605.05220#A9.SS0.SSS1)were created based on the scores provided in these tables\.

Table 8:Model SDXL, flipping from horse to motorcyclehorsemotorcyclecowpigdoglegislatormethodstrengthsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowNo Steering\-71\.049\.151\.870\.7\-72\.754\.641\.5\-71\.849\.543\.6\-66\.352\.444\.9\-60\.844\.842\.4\-CASteer1\.070\.050\.852\.671\.327\.872\.354\.442\.421\.971\.849\.043\.813\.366\.252\.045\.020\.060\.944\.842\.516\.31\.553\.468\.360\.163\.7121\.372\.054\.443\.028\.771\.948\.944\.116\.266\.151\.945\.124\.760\.944\.842\.521\.02\.052\.169\.568\.352\.9212\.470\.954\.444\.942\.771\.948\.944\.418\.966\.151\.845\.428\.660\.944\.842\.624\.62\.551\.769\.469\.451\.7213\.062\.554\.255\.8105\.272\.048\.744\.622\.066\.151\.745\.633\.060\.944\.942\.727\.03\.051\.469\.169\.950\.9210\.052\.453\.266\.2186\.972\.048\.544\.925\.866\.051\.645\.737\.260\.944\.942\.729\.74\.051\.068\.770\.649\.9207\.948\.652\.369\.5222\.672\.048\.546\.537\.265\.751\.446\.746\.560\.944\.942\.835\.05\.050\.768\.570\.949\.5207\.447\.851\.869\.6231\.670\.549\.150\.663\.764\.351\.449\.061\.661\.045\.143\.039\.0LEACE1\.065\.056\.552\.771\.228\.672\.554\.442\.017\.071\.749\.343\.68\.966\.252\.344\.814\.160\.744\.842\.521\.11\.552\.168\.657\.167\.084\.672\.254\.442\.321\.371\.749\.343\.710\.966\.252\.344\.817\.760\.744\.842\.524\.92\.051\.268\.867\.653\.3207\.672\.254\.542\.725\.271\.749\.343\.712\.666\.152\.344\.820\.860\.644\.842\.428\.22\.550\.868\.569\.051\.5213\.371\.954\.543\.130\.071\.749\.443\.714\.066\.152\.344\.822\.660\.644\.942\.531\.03\.050\.568\.269\.650\.5210\.671\.454\.443\.837\.071\.749\.443\.815\.166\.052\.244\.824\.560\.644\.842\.533\.24\.050\.268\.070\.449\.6206\.764\.654\.252\.685\.771\.849\.443\.917\.266\.052\.244\.728\.160\.544\.842\.637\.35\.049\.967\.770\.849\.1204\.454\.453\.263\.6166\.271\.849\.444\.019\.465\.952\.244\.731\.260\.444\.842\.742\.0MidSteer1\.051\.268\.751\.970\.712\.772\.254\.542\.523\.971\.849\.243\.912\.466\.152\.344\.820\.760\.745\.042\.327\.21\.550\.468\.151\.970\.715\.171\.654\.643\.534\.571\.949\.144\.115\.266\.152\.244\.825\.360\.545\.142\.332\.32\.050\.067\.852\.070\.817\.265\.954\.350\.977\.371\.949\.044\.317\.666\.152\.144\.828\.860\.645\.242\.335\.92\.549\.567\.452\.070\.718\.655\.153\.362\.6162\.371\.949\.044\.520\.266\.052\.144\.832\.260\.545\.542\.439\.83\.049\.267\.252\.070\.620\.051\.052\.766\.7199\.372\.049\.044\.823\.566\.052\.044\.835\.360\.245\.542\.443\.14\.048\.867\.252\.070\.522\.948\.451\.968\.7222\.572\.149\.045\.529\.565\.851\.944\.941\.459\.746\.042\.749\.95\.048\.366\.952\.070\.428\.047\.951\.568\.7229\.871\.949\.246\.840\.065\.551\.945\.248\.759\.146\.743\.258\.2

Table 9:Model SDXL, flipping from chihuahua to muffinchihuahuamuffindogwolfcatlegislatormethodstrengthsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowNo Steering\-75\.954\.642\.668\.2\-66\.357\.952\.5\-71\.852\.745\.6\-67\.553\.454\.2\-60\.842\.640\.1\-CASteer1\.071\.754\.759\.361\.5145\.965\.754\.452\.437\.471\.951\.744\.719\.167\.252\.853\.926\.260\.842\.439\.919\.21\.547\.061\.066\.557\.3211\.565\.253\.152\.453\.371\.851\.244\.324\.467\.152\.453\.833\.860\.842\.339\.623\.22\.043\.763\.269\.556\.7226\.463\.251\.753\.079\.771\.851\.043\.930\.466\.852\.053\.641\.760\.842\.339\.626\.72\.542\.363\.871\.556\.3241\.155\.248\.955\.5140\.571\.550\.643\.636\.866\.351\.753\.653\.860\.942\.339\.629\.83\.041\.464\.072\.956\.3253\.848\.145\.858\.2192\.470\.850\.343\.645\.763\.650\.754\.382\.460\.942\.439\.532\.74\.040\.164\.374\.856\.2276\.044\.243\.560\.3226\.965\.649\.045\.696\.652\.547\.257\.0168\.760\.942\.439\.438\.05\.039\.364\.175\.656\.1291\.642\.942\.560\.8243\.353\.547\.051\.5206\.146\.945\.258\.0213\.060\.842\.539\.343\.9LEACE1\.067\.155\.760\.460\.6160\.666\.055\.852\.425\.472\.052\.145\.311\.967\.553\.254\.117\.560\.842\.540\.220\.01\.546\.261\.566\.457\.3212\.165\.954\.952\.333\.172\.152\.045\.114\.767\.553\.154\.021\.160\.942\.440\.324\.62\.042\.863\.369\.356\.9227\.765\.854\.252\.340\.372\.151\.844\.916\.967\.653\.053\.924\.660\.942\.340\.328\.12\.541\.463\.671\.056\.7240\.765\.553\.452\.248\.672\.151\.544\.719\.067\.652\.953\.927\.861\.042\.340\.431\.33\.040\.663\.972\.356\.7250\.865\.152\.552\.357\.172\.251\.444\.621\.567\.652\.753\.830\.461\.042\.540\.534\.04\.039\.363\.873\.756\.6269\.463\.851\.352\.676\.772\.251\.044\.225\.767\.552\.653\.835\.461\.042\.740\.639\.45\.038\.563\.774\.656\.5282\.560\.349\.753\.7104\.172\.250\.744\.029\.867\.452\.453\.741\.161\.043\.041\.143\.4MidSteer1\.044\.362\.542\.068\.323\.965\.954\.752\.435\.172\.251\.744\.916\.967\.653\.054\.122\.760\.842\.540\.123\.91\.541\.563\.941\.968\.228\.365\.553\.552\.448\.872\.251\.244\.621\.267\.752\.854\.127\.760\.842\.440\.229\.02\.039\.864\.041\.768\.232\.264\.552\.252\.765\.672\.250\.844\.125\.467\.752\.654\.131\.660\.942\.540\.432\.52\.538\.964\.341\.668\.135\.763\.251\.353\.183\.072\.250\.644\.029\.067\.652\.454\.035\.761\.042\.440\.536\.33\.038\.064\.241\.568\.039\.559\.449\.754\.4109\.872\.250\.343\.833\.467\.552\.254\.040\.160\.942\.440\.739\.44\.037\.564\.041\.367\.948\.248\.344\.858\.7178\.371\.650\.344\.243\.967\.051\.854\.350\.760\.642\.441\.145\.15\.037\.963\.641\.367\.558\.943\.442\.261\.1209\.870\.250\.444\.863\.164\.550\.754\.569\.660\.442\.541\.551\.4

Table 10:Model SDXL, flipping from snoopy to mickeysnoopymickeypikachuspongebobdoglegislatormethodstrengthsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowNo Steering\-74\.358\.756\.073\.1\-72\.641\.351\.2\-75\.149\.052\.5\-66\.356\.152\.0\-60\.841\.645\.0\-CASteer1\.065\.668\.458\.970\.847\.572\.741\.451\.213\.875\.148\.952\.525\.066\.455\.052\.125\.460\.841\.645\.014\.91\.559\.871\.261\.768\.659\.672\.741\.451\.217\.175\.148\.952\.629\.766\.354\.452\.133\.960\.741\.645\.018\.42\.057\.072\.466\.865\.372\.372\.741\.551\.119\.475\.148\.952\.633\.566\.453\.952\.240\.660\.841\.545\.021\.12\.555\.472\.769\.661\.383\.972\.741\.651\.221\.275\.048\.852\.536\.566\.453\.352\.248\.560\.941\.645\.023\.33\.054\.572\.770\.359\.491\.872\.741\.651\.222\.975\.148\.952\.639\.166\.452\.852\.354\.860\.841\.645\.025\.64\.053\.071\.970\.857\.4106\.372\.741\.751\.225\.975\.248\.752\.643\.366\.452\.152\.666\.560\.741\.645\.028\.95\.052\.071\.371\.256\.4117\.572\.741\.751\.228\.975\.048\.752\.747\.766\.351\.352\.879\.960\.741\.544\.931\.5LEACE1\.065\.967\.958\.870\.650\.072\.841\.451\.118\.575\.048\.952\.533\.566\.355\.852\.018\.660\.941\.745\.122\.21\.560\.171\.061\.068\.162\.672\.841\.551\.122\.075\.048\.852\.539\.066\.355\.852\.123\.960\.941\.745\.026\.42\.057\.572\.264\.364\.677\.072\.941\.551\.124\.575\.148\.952\.543\.666\.355\.652\.128\.460\.941\.945\.229\.72\.556\.072\.767\.360\.789\.972\.941\.651\.126\.775\.248\.952\.547\.766\.355\.552\.232\.061\.041\.945\.131\.93\.055\.172\.768\.258\.2101\.573\.041\.751\.229\.175\.248\.852\.551\.366\.455\.452\.335\.761\.041\.945\.134\.84\.053\.772\.368\.755\.9118\.673\.241\.951\.233\.475\.049\.052\.657\.766\.555\.352\.543\.460\.942\.145\.238\.95\.053\.072\.069\.555\.2134\.673\.442\.451\.437\.674\.849\.152\.764\.066\.555\.252\.750\.060\.942\.345\.342\.2MidSteer1\.059\.771\.055\.572\.830\.172\.841\.351\.319\.074\.748\.952\.636\.566\.255\.952\.120\.460\.941\.745\.024\.51\.556\.072\.555\.372\.734\.872\.941\.251\.222\.374\.648\.652\.642\.766\.255\.852\.125\.960\.841\.845\.029\.12\.054\.572\.755\.072\.438\.972\.941\.351\.325\.374\.548\.652\.646\.866\.355\.652\.130\.660\.742\.045\.132\.62\.553\.472\.454\.872\.442\.273\.041\.251\.327\.374\.348\.452\.650\.866\.355\.652\.334\.460\.742\.045\.135\.23\.052\.571\.854\.772\.244\.773\.141\.351\.330\.074\.248\.552\.755\.266\.455\.552\.538\.360\.742\.145\.337\.64\.051\.571\.254\.671\.949\.673\.241\.451\.434\.573\.448\.453\.064\.766\.455\.352\.747\.260\.442\.345\.443\.35\.051\.070\.454\.471\.555\.173\.341\.351\.439\.172\.748\.653\.472\.966\.555\.152\.955\.360\.342\.545\.648\.8

Table 11:Model SANA, flipping from horse to motorcyclehorsemotorcyclecowpigdoglegislatormethodstrengthsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowNo Steering\-72\.150\.650\.970\.5\-73\.855\.342\.5\-73\.548\.344\.5\-68\.151\.746\.1\-60\.445\.843\.9\-CASteer1\.071\.052\.052\.271\.348\.173\.654\.843\.022\.173\.848\.044\.813\.868\.151\.646\.215\.760\.246\.044\.315\.92\.065\.560\.467\.556\.0229\.073\.154\.243\.737\.974\.047\.745\.022\.668\.151\.446\.324\.160\.046\.044\.521\.53\.054\.968\.769\.551\.8234\.672\.354\.045\.055\.874\.247\.445\.330\.568\.151\.346\.532\.259\.746\.344\.925\.94\.052\.670\.370\.250\.7228\.769\.355\.153\.0107\.174\.547\.445\.939\.968\.251\.246\.740\.459\.546\.545\.331\.15\.051\.871\.070\.550\.3224\.962\.354\.861\.7167\.074\.747\.546\.948\.968\.251\.247\.051\.359\.346\.745\.836\.8LEACE1\.068\.756\.251\.671\.636\.373\.755\.242\.610\.273\.648\.244\.78\.068\.151\.846\.08\.160\.445\.743\.911\.62\.053\.170\.758\.769\.4125\.073\.755\.242\.816\.273\.748\.144\.712\.668\.151\.846\.013\.160\.545\.743\.817\.03\.051\.971\.365\.758\.2229\.373\.755\.143\.121\.069\.857\.652\.1220\.668\.051\.846\.017\.760\.745\.843\.821\.64\.051\.971\.167\.453\.7242\.873\.755\.043\.325\.973\.948\.044\.920\.468\.051\.846\.021\.555\.152\.251\.9294\.65\.051\.770\.868\.452\.6236\.273\.754\.943\.631\.874\.048\.045\.024\.068\.051\.845\.925\.060\.845\.843\.830\.1MidSteer1\.055\.568\.950\.970\.48\.773\.655\.142\.714\.773\.748\.244\.710\.468\.051\.846\.19\.360\.545\.844\.013\.02\.051\.771\.150\.970\.411\.773\.655\.043\.123\.273\.948\.144\.816\.568\.051\.746\.115\.060\.645\.843\.919\.13\.051\.470\.750\.970\.413\.973\.454\.743\.432\.574\.148\.045\.022\.267\.951\.746\.119\.860\.845\.944\.023\.94\.051\.470\.950\.970\.415\.973\.154\.543\.943\.874\.348\.145\.327\.467\.951\.646\.124\.360\.845\.944\.128\.95\.051\.571\.451\.070\.418\.069\.855\.250\.078\.274\.548\.145\.734\.367\.951\.646\.228\.461\.046\.044\.234\.3

Table 12:Model SANA, flipping from chihuahua to muffinchihuahuamuffindogwolfcatlegislatormethodstrengthsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowNo Steering\-76\.455\.043\.466\.3\-68\.162\.052\.7\-73\.252\.846\.1\-68\.553\.453\.0\-60\.442\.740\.8\-CASteer1\.075\.855\.845\.365\.646\.667\.758\.452\.939\.473\.152\.745\.816\.368\.352\.853\.524\.560\.242\.841\.013\.82\.055\.360\.162\.060\.4200\.767\.156\.354\.178\.872\.952\.445\.628\.468\.352\.154\.339\.260\.142\.741\.118\.93\.045\.460\.560\.047\.6278\.550\.649\.059\.2198\.972\.452\.445\.846\.267\.951\.055\.061\.759\.942\.841\.323\.44\.044\.760\.770\.654\.9250\.745\.146\.060\.8223\.368\.652\.046\.6113\.865\.550\.155\.5109\.259\.942\.741\.427\.65\.044\.160\.772\.154\.2260\.743\.445\.361\.1229\.358\.951\.150\.3216\.256\.948\.057\.6179\.159\.942\.741\.631\.3LEACE1\.072\.358\.346\.165\.057\.767\.959\.952\.721\.073\.252\.946\.15\.868\.453\.252\.98\.960\.442\.740\.89\.12\.048\.961\.459\.863\.0185\.867\.858\.352\.835\.173\.252\.946\.19\.468\.453\.153\.013\.660\.442\.540\.813\.73\.045\.560\.765\.757\.2227\.567\.757\.152\.848\.473\.252\.946\.012\.568\.553\.153\.017\.560\.442\.740\.716\.94\.044\.660\.568\.155\.8240\.467\.756\.353\.060\.873\.252\.946\.015\.468\.553\.053\.021\.060\.442\.740\.719\.45\.044\.260\.169\.455\.1246\.367\.455\.953\.676\.773\.152\.946\.018\.168\.652\.953\.024\.060\.442\.940\.621\.5MidSteer1\.053\.360\.543\.466\.27\.866\.262\.548\.3216\.473\.252\.845\.97\.168\.453\.253\.112\.160\.442\.640\.810\.72\.044\.458\.743\.466\.013\.167\.656\.853\.057\.973\.152\.845\.811\.268\.452\.953\.117\.860\.442\.540\.815\.83\.044\.156\.443\.565\.817\.765\.855\.454\.496\.573\.052\.845\.714\.568\.552\.753\.223\.060\.342\.440\.819\.84\.045\.053\.843\.665\.622\.356\.451\.656\.3154\.473\.052\.945\.617\.368\.552\.553\.227\.260\.242\.540\.822\.95\.046\.151\.843\.665\.527\.349\.348\.557\.2193\.273\.052\.945\.519\.968\.552\.253\.431\.360\.142\.440\.625\.9

Table 13:Model SANA, flipping from snoopy to mickeysnoopymickeypikachuspongebobdoglegislatormethodstrengthsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↑\\uparrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowcs↑\\uparrowsrc\-cs↓\\downarrowtgt\-cs↓\\downarrowfid↓\\downarrowNo Steering\-79\.758\.056\.376\.1\-74\.041\.550\.9\-79\.050\.753\.8\-67\.357\.057\.1\-60\.442\.946\.6\-CASteer1\.077\.961\.757\.876\.331\.874\.141\.550\.88\.679\.050\.753\.814\.568\.154\.052\.0218\.860\.343\.046\.812\.52\.065\.671\.061\.076\.553\.874\.141\.650\.813\.978\.950\.853\.819\.868\.253\.552\.0218\.360\.243\.046\.917\.73\.055\.872\.371\.774\.878\.874\.241\.750\.718\.678\.950\.853\.823\.968\.153\.252\.1218\.360\.143\.046\.821\.44\.052\.071\.478\.567\.6102\.974\.341\.850\.722\.778\.950\.853\.827\.968\.252\.852\.1218\.560\.143\.047\.025\.45\.050\.070\.479\.861\.9121\.074\.542\.050\.726\.779\.050\.853\.831\.868\.252\.452\.2218\.060\.143\.047\.228\.5LEACE1\.076\.663\.058\.276\.436\.074\.141\.651\.03\.279\.050\.753\.710\.868\.154\.252\.0219\.360\.242\.946\.65\.92\.062\.671\.357\.972\.9170\.974\.141\.751\.05\.578\.950\.753\.715\.168\.054\.052\.0220\.060\.242\.946\.69\.63\.055\.971\.775\.173\.488\.974\.241\.751\.07\.578\.950\.753\.718\.268\.053\.752\.1218\.760\.142\.946\.712\.24\.052\.968\.879\.266\.4111\.074\.241\.851\.19\.578\.950\.853\.821\.168\.053\.652\.1218\.860\.142\.946\.714\.65\.051\.065\.579\.660\.8130\.574\.341\.951\.111\.278\.950\.853\.824\.168\.053\.352\.2219\.759\.942\.946\.716\.4MidSteer1\.064\.970\.856\.076\.216\.074\.141\.551\.05\.879\.150\.653\.812\.368\.154\.052\.0219\.260\.242\.946\.78\.32\.053\.571\.155\.676\.225\.274\.241\.551\.110\.079\.250\.653\.917\.468\.153\.652\.1218\.260\.142\.946\.712\.43\.050\.067\.455\.276\.234\.874\.441\.551\.213\.579\.250\.654\.021\.668\.053\.252\.2218\.659\.942\.846\.715\.64\.048\.462\.654\.976\.445\.474\.541\.651\.316\.879\.250\.554\.025\.867\.952\.652\.3216\.659\.842\.846\.717\.85\.048\.160\.754\.576\.655\.374\.641\.551\.320\.079\.250\.354\.129\.668\.052\.352\.4215\.155\.047\.552\.8271\.2

## Appendix JConcept erasure

In this section, we provide experiments for concept erasure\. The reason for such is to show that unified MidSteer framework, that also includes erasure in caseZ2Z\_\{2\}is constant, performs favourably on a variety of tasks\. Note that in case of erasure, both MidSteer and LEACE provide the same solutions, so we will label the results as LEACE / MidSteer in charts\.

For concept erasure, we use evaluation setup seimilar to that of concept switching\. For LLM, We test erasure of the conceptscs∈c\_\{s\}\\in\(”Horse”,c2c\_\{2\}=”Dog”\)\. Corresponding testing concepts areti=\{ci\}i=15t\_\{i\}=\\\{c\_\{i\}\\\}^\{5\}\_\{i=1\}:t1t\_\{1\}= \(”Motorcycle”, ”Cow”, ”Dog”, ”Pig”, ”Legislator”\),t2t\_\{2\}= \(”Cat”, ”Cow”, ”Wolf”, ”Pig”, ”Legislator”\)\. For Diffusion Models, We test erasure of the conceptscs∈c\_\{s\}\\in\(”Horse”,c2c\_\{2\}=”Chichuahua”\)\. Corresponding testing concepts areti=\{ci\}i=15t\_\{i\}=\\\{c\_\{i\}\\\}^\{5\}\_\{i=1\}:t1t\_\{1\}= \(”Motorcycle”, ”Cow”, ”Dog”, ”Pig”, ”Legislator”\),t2t\_\{2\}= \(”Muffin”, ”Cat”, ”Dog”, ”Wolf”, ”Legislator”\),

The setup for erasure is similar to those of main paper experiments \(Sec\.[5](https://arxiv.org/html/2605.05220#S5)\)\. The only difference is that there is no target concept \(i\.e\. it is a dummy concept\)\. We utilise similar pairs of prompts\(c1,c2\)\(c\_\{1\},c\_\{2\}\)as in switching experiments, with the goal of removingc1c\_\{1\}\.

Fig\.[20\(b\)](https://arxiv.org/html/2605.05220#A10.F20.sf2),[20\(d\)](https://arxiv.org/html/2605.05220#A10.F20.sf4),[20\(a\)](https://arxiv.org/html/2605.05220#A10.F20.sf1),[20\(c\)](https://arxiv.org/html/2605.05220#A10.F20.sf3)present Pareto plots for concept erasure similar to that for concept switching in sec\.[5](https://arxiv.org/html/2605.05220#S5)\. More precisely, we compare results on concept switching by applying vanilla steering, LEACE \(which is equivalent to MidSteer in the case of erasure\) with different values ofβ\\beta\. We present results on LLama\-2\-7b and SDXL models aggregated for all three erase concepts\. It can be clearly seen, that in each case, MidSteer achieves much better balance between level of concept switch between c1 and c2 and preservation of other concepts across different values ofβ\\beta\.

Next, in sec\.[J\.1](https://arxiv.org/html/2605.05220#A10.SS1),[J\.2](https://arxiv.org/html/2605.05220#A10.SS2)we provide detailed Pareto plots for each model, and in sec\.[J\.3](https://arxiv.org/html/2605.05220#A10.SS3),[J\.4](https://arxiv.org/html/2605.05220#A10.SS4)we provide Tables with detailed breakdown of scores for allβ\\betavalues and all erased concepts\.

![Refer to caption](https://arxiv.org/html/2605.05220v1/x45.png)\(a\)Erased concept score vs CS of unrelated concepts on Llama\-2\-7b model\.
![Refer to caption](https://arxiv.org/html/2605.05220v1/x46.png)\(b\)Erased concept score vs 1 \- BERT Precision on MMLU on Llama\-2\-7b model\.
![Refer to caption](https://arxiv.org/html/2605.05220v1/x47.png)\(c\)Erased concept score vs CS of unrelated concepts on SDXL model\.
![Refer to caption](https://arxiv.org/html/2605.05220v1/x48.png)\(d\)Erased concept score vs FID of unrelated concepts on SDXL model\.

Figure 20:Pareto efficiency frontiers for concepterasureexperiments with vanilla steering and LEACE / MidSteer highlighting differentβ\\beta\.### J\.1Pareto charts for LLM concept erasure

In this section, we provide more Pareto charts for LLM concept erasure\. When erasing conceptcsc\_\{s\}, for each LLM model we provide 3 types of Pareto plots:

- •1 \- BERT Precision score for unrelated concepts \(horizontal axis\) vs Concept Score \(CS\) for the erasedcsc\_\{s\}concept \(vertical axis\) \(fig\.[21\(b\)](https://arxiv.org/html/2605.05220#A10.F21.sf2),[22\(b\)](https://arxiv.org/html/2605.05220#A10.F22.sf2),[23\(a\)](https://arxiv.org/html/2605.05220#A10.F23.sf1)\)
- •1 \- BERT Precision score for MMLU \(horizontal axis\) vs Concept Score \(CS\) for the erasedcsc\_\{s\}concept \(vertical axis\) \(fig\.[21\(a\)](https://arxiv.org/html/2605.05220#A10.F21.sf1),[22\(a\)](https://arxiv.org/html/2605.05220#A10.F22.sf1),[23\(a\)](https://arxiv.org/html/2605.05220#A10.F23.sf1)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vs Concept Score \(CS\) for the erasedcsc\_\{s\}concept \(vertical axis\) \(fig\.[21\(c\)](https://arxiv.org/html/2605.05220#A10.F21.sf3),[22\(c\)](https://arxiv.org/html/2605.05220#A10.F22.sf3),[23\(c\)](https://arxiv.org/html/2605.05220#A10.F23.sf3)\)

In each case, we see clear superiority of LEACE/MidSteer over other steering approaches\.

We additionally provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,cic\_\{s\},c\_\{i\}in the tables in sec\.[J\.3](https://arxiv.org/html/2605.05220#A10.SS3)

![Refer to caption](https://arxiv.org/html/2605.05220v1/x49.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x50.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x51.png)\(c\)Unrelated vs CS

Figure 21:Pareto plot for concept erasure on model llama2\-7b![Refer to caption](https://arxiv.org/html/2605.05220v1/x52.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x53.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x54.png)\(c\)Unrelated vs CS

Figure 22:Pareto plot for concept erasure on model qwen\-14b![Refer to caption](https://arxiv.org/html/2605.05220v1/x55.png)\(a\)MMLU vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x56.png)\(b\)Unrelated vs BERTP
![Refer to caption](https://arxiv.org/html/2605.05220v1/x57.png)\(c\)Unrelated vs CS

Figure 23:Pareto plot for concept erasure on model qwen\-7b
### J\.2Pareto charts for image diffusion concept erasure

In this section, we provide more Pareto charts for LLM concept erasure\. When erasing conceptcsc\_\{s\}, for each Diffusion model we provide 2 types of Pareto plots:

- •FID score for unrelated concepts \(horizontal axis\) vs Concept Score \(CS\) for the erasedcsc\_\{s\}concept \(vertical axis\) \(fig\.[25\(b\)](https://arxiv.org/html/2605.05220#A10.F25.sf2),[25\(b\)](https://arxiv.org/html/2605.05220#A10.F25.sf2)\)
- •Average Concept Score \(CS\) for unrelated conceptscic\_\{i\}\(horizontal axis\) vs Concept Score \(CS\) for the erasedcsc\_\{s\}concept \(vertical axis\) \(fig\.[25\(a\)](https://arxiv.org/html/2605.05220#A10.F25.sf1),[24\(a\)](https://arxiv.org/html/2605.05220#A10.F24.sf1)\)

In each case, we see clear superiority of LEACE/MidSteer over other steering approaches\.

We additionally provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,cic\_\{s\},c\_\{i\}in the tables in sec\.[J\.4](https://arxiv.org/html/2605.05220#A10.SS4)

![Refer to caption](https://arxiv.org/html/2605.05220v1/x58.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x59.png)\(b\)Unrelated vs FID

Figure 24:Pareto plot for concept erase on model sana![Refer to caption](https://arxiv.org/html/2605.05220v1/x60.png)\(a\)Unrelated vs CS
![Refer to caption](https://arxiv.org/html/2605.05220v1/x61.png)\(b\)Unrelated vs FID

Figure 25:Pareto plot for concept erase on model sdxl
### J\.3Results for LLM concept erasure

In this section, in tab\.[14](https://arxiv.org/html/2605.05220#A10.T14),[15](https://arxiv.org/html/2605.05220#A10.T15),[18](https://arxiv.org/html/2605.05220#A10.T18),[19](https://arxiv.org/html/2605.05220#A10.T19),[18](https://arxiv.org/html/2605.05220#A10.T18),[17](https://arxiv.org/html/2605.05220#A10.T17)we provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,cic\_\{s\},c\_\{i\}\. Pareto plots in sec\.[J\.1](https://arxiv.org/html/2605.05220#A10.SS1)were created based on the scores provided in these tables\.

Table 14:Model LLama2\-7b, erasure of horseshorsesmotorcyclescowspigsdogslegislatorsmethodstrengthcs↓\\downarrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowNo Steering\-8\.68\.5\-8\.4\-8\.5\-8\.7\-8\.4\-Steering1\.08\.68\.50\.918\.40\.908\.40\.908\.60\.908\.50\.892\.08\.48\.50\.908\.40\.898\.30\.898\.50\.908\.30\.893\.01\.53\.40\.802\.40\.781\.40\.782\.90\.792\.60\.794\.00\.40\.00\.760\.10\.760\.00\.770\.00\.770\.00\.775\.01\.00\.00\.760\.10\.770\.00\.770\.00\.770\.00\.77LEACE1\.08\.58\.50\.918\.30\.918\.40\.908\.60\.918\.30\.902\.05\.08\.30\.897\.30\.877\.40\.888\.00\.897\.80\.883\.00\.24\.60\.863\.30\.853\.70\.854\.00\.864\.60\.864\.00\.01\.00\.820\.20\.810\.20\.810\.40\.821\.10\.825\.00\.00\.10\.810\.00\.810\.00\.810\.10\.810\.20\.81

Table 15:Model LLama2\-7b, erasure of dogsdogscatswolvescowspigslegislatorsmethodstrengthcs↓\\downarrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowNo Steering\-8\.68\.6\-8\.5\-8\.4\-8\.4\-8\.5\-Steering1\.08\.68\.50\.908\.40\.908\.40\.908\.40\.908\.50\.902\.08\.58\.50\.908\.30\.898\.30\.898\.30\.898\.40\.893\.02\.13\.10\.793\.60\.803\.00\.791\.90\.783\.40\.794\.00\.10\.00\.760\.10\.760\.00\.750\.00\.750\.00\.765\.00\.30\.00\.760\.00\.750\.00\.760\.00\.760\.00\.75LEACE1\.08\.58\.50\.918\.40\.918\.30\.918\.40\.918\.40\.902\.06\.57\.80\.887\.60\.887\.20\.887\.50\.887\.70\.883\.01\.33\.80\.864\.20\.863\.40\.853\.70\.854\.60\.864\.00\.10\.40\.820\.20\.830\.50\.820\.30\.821\.00\.825\.00\.00\.10\.810\.10\.820\.10\.820\.10\.810\.10\.81

Table 16:Model Qwen2\.5\-7b, erasure of horseshorsesmotorcyclescowspigsdogslegislatorsmethodstrengthcs↓\\downarrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowNo Steering\-8\.78\.6\-8\.4\-8\.4\-8\.7\-8\.5\-Steering1\.08\.68\.50\.898\.40\.888\.40\.888\.70\.888\.60\.882\.08\.07\.80\.848\.10\.847\.80\.838\.10\.838\.10\.833\.03\.64\.00\.725\.10\.734\.50\.734\.70\.733\.90\.724\.01\.22\.80\.692\.60\.701\.90\.702\.60\.702\.00\.695\.00\.61\.00\.700\.70\.700\.30\.700\.70\.700\.90\.69LEACE1\.08\.58\.50\.898\.40\.898\.40\.898\.70\.898\.50\.892\.06\.98\.50\.898\.40\.888\.40\.888\.70\.888\.50\.883\.00\.68\.50\.878\.30\.858\.30\.858\.60\.868\.40\.864\.00\.28\.30\.858\.10\.838\.10\.828\.40\.838\.20\.835\.00\.07\.30\.766\.70\.736\.20\.736\.90\.737\.20\.74

Table 17:Model Qwen2\.5\-7b, erasure of dogsdogscatswolvescowspigslegislatorsmethodstrengthcs↓\\downarrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowNo Steering\-8\.78\.5\-8\.3\-8\.4\-8\.4\-8\.5\-Steering1\.08\.68\.50\.888\.30\.888\.40\.888\.40\.888\.50\.882\.08\.08\.10\.848\.00\.848\.10\.847\.90\.848\.30\.843\.03\.95\.00\.744\.60\.745\.40\.754\.80\.754\.60\.754\.01\.42\.20\.711\.70\.712\.40\.711\.60\.712\.10\.715\.00\.30\.70\.710\.40\.710\.70\.710\.40\.720\.80\.71LEACE1\.08\.78\.50\.898\.40\.888\.40\.898\.40\.888\.50\.892\.08\.78\.50\.888\.30\.878\.40\.888\.40\.878\.50\.883\.07\.88\.50\.858\.20\.848\.30\.858\.30\.858\.40\.864\.03\.28\.20\.837\.70\.798\.10\.837\.90\.818\.20\.825\.00\.66\.50\.725\.60\.716\.60\.725\.90\.726\.90\.72

Table 18:Model Qwen2\.5\-14b, erasure of horseshorsesmotorcyclescowspigsdogslegislatorsmethodstrengthcs↓\\downarrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowNo Steering\-8\.78\.6\-8\.4\-8\.4\-8\.7\-8\.5\-Steering1\.08\.78\.60\.908\.40\.908\.50\.898\.70\.908\.50\.892\.08\.78\.60\.898\.40\.898\.50\.898\.70\.898\.50\.893\.08\.58\.50\.868\.40\.868\.40\.868\.60\.868\.40\.874\.02\.94\.20\.785\.00\.785\.00\.784\.30\.794\.30\.795\.00\.81\.50\.731\.20\.732\.00\.741\.40\.740\.60\.73LEACE1\.08\.78\.60\.908\.40\.898\.40\.888\.70\.898\.50\.882\.08\.08\.50\.888\.30\.878\.30\.868\.70\.878\.40\.863\.03\.08\.50\.858\.30\.848\.30\.838\.60\.848\.30\.834\.01\.28\.30\.828\.10\.818\.00\.808\.40\.818\.10\.805\.00\.37\.80\.797\.50\.797\.50\.787\.90\.797\.70\.79

Table 19:Model Qwen2\.5\-14b, erasure of dogsdogscatswolvescowspigslegislatorsmethodstrengthcs↓\\downarrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowcs↑\\uparrowbertp↑\\uparrowNo Steering\-8\.78\.5\-8\.4\-8\.4\-8\.5\-8\.5\-Steering1\.08\.78\.50\.908\.40\.898\.50\.898\.50\.898\.60\.892\.08\.78\.50\.898\.40\.888\.40\.898\.50\.898\.50\.893\.08\.48\.50\.868\.30\.868\.40\.868\.40\.868\.40\.874\.03\.53\.70\.793\.10\.774\.20\.784\.20\.783\.40\.795\.01\.90\.90\.731\.60\.730\.90\.731\.20\.730\.40\.73LEACE1\.08\.78\.50\.898\.40\.888\.40\.898\.40\.898\.50\.892\.08\.78\.40\.878\.40\.868\.30\.878\.40\.868\.40\.863\.07\.78\.40\.848\.20\.838\.30\.848\.20\.838\.30\.834\.04\.78\.30\.827\.90\.818\.20\.828\.00\.808\.10\.805\.02\.37\.90\.797\.50\.797\.70\.807\.50\.787\.70\.79

### J\.4Results for image diffusion concept erasure

In this section, in tab\.[22](https://arxiv.org/html/2605.05220#A10.T22),[25](https://arxiv.org/html/2605.05220#A10.T25),[21](https://arxiv.org/html/2605.05220#A10.T21),[25](https://arxiv.org/html/2605.05220#A10.T25)we provide detailed breakdown of scores for allβ\\betavalues and all conceptscs,cic\_\{s\},c\_\{i\}\. Pareto plots in sec\.[J\.2](https://arxiv.org/html/2605.05220#A10.SS2)were created based on the scores provided in these tables\.

Table 20:Model SDXL, erasure of snoopysnoopymickeypikachuspongebobdoglegislatormethodstrengthcs↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowNo Steering\-74\.373\.1\-72\.6\-75\.1\-66\.3\-60\.8\-CASteer1\.055\.870\.154\.972\.530\.373\.950\.966\.230\.660\.922\.61\.549\.967\.971\.872\.539\.972\.866\.066\.239\.460\.927\.52\.047\.065\.290\.572\.651\.271\.085\.266\.248\.160\.831\.72\.545\.662\.2111\.072\.565\.768\.6109\.466\.158\.160\.935\.33\.045\.358\.8132\.172\.283\.765\.3138\.266\.268\.060\.838\.74\.045\.353\.5169\.071\.6123\.459\.0189\.566\.183\.360\.945\.85\.045\.950\.7195\.369\.3153\.055\.7218\.265\.699\.961\.052\.9LEACE1\.056\.772\.235\.772\.921\.374\.142\.266\.320\.760\.626\.91\.551\.271\.742\.373\.025\.873\.750\.166\.326\.560\.532\.52\.048\.371\.048\.273\.229\.673\.357\.866\.331\.660\.436\.42\.546\.570\.353\.973\.333\.072\.866\.566\.436\.560\.241\.33\.045\.869\.659\.873\.536\.772\.175\.366\.440\.860\.046\.54\.045\.867\.972\.273\.744\.870\.891\.866\.549\.559\.456\.15\.047\.066\.185\.973\.753\.869\.4114\.166\.457\.158\.669\.2

Table 21:Model SDXL, erasure of chihuahuachihuahuamuffindogwolfcatlegislatormethodstrengthcs↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowNo Steering\-75\.968\.2\-66\.3\-71\.8\-67\.5\-60\.8\-CASteer1\.054\.668\.119\.765\.058\.272\.525\.967\.035\.260\.922\.71\.548\.568\.224\.061\.299\.972\.634\.066\.548\.960\.827\.82\.047\.668\.027\.354\.1155\.572\.644\.164\.669\.060\.931\.92\.547\.267\.931\.150\.7177\.872\.261\.260\.5102\.660\.835\.83\.046\.967\.934\.649\.7187\.770\.096\.255\.8141\.560\.839\.44\.047\.867\.742\.249\.0198\.262\.2191\.650\.7186\.360\.745\.75\.049\.767\.649\.548\.9209\.357\.8228\.449\.4201\.560\.752\.1LEACE1\.055\.068\.220\.065\.835\.272\.317\.167\.422\.160\.921\.81\.548\.568\.125\.065\.547\.672\.521\.367\.327\.160\.827\.02\.047\.468\.129\.065\.061\.172\.625\.067\.331\.460\.931\.02\.547\.068\.232\.664\.174\.872\.828\.667\.235\.160\.834\.23\.047\.268\.136\.062\.792\.472\.932\.467\.138\.560\.836\.54\.048\.668\.042\.357\.6131\.473\.139\.466\.945\.160\.741\.95\.050\.268\.049\.453\.7162\.673\.148\.366\.552\.560\.548\.3

Table 22:Model SDXL, erasure of horsehorsemotorcyclecowpigdoglegislatormethodstrengthcs↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowNo Steering\-71\.070\.7\-72\.7\-71\.8\-66\.3\-60\.8\-CASteer1\.059\.370\.712\.971\.930\.171\.820\.865\.929\.961\.021\.31\.549\.870\.715\.671\.246\.671\.827\.665\.836\.561\.126\.22\.048\.370\.617\.469\.279\.871\.936\.565\.742\.261\.030\.32\.547\.970\.719\.562\.1152\.572\.045\.965\.448\.460\.933\.63\.047\.870\.721\.754\.7211\.172\.060\.165\.054\.760\.937\.04\.048\.070\.726\.951\.1227\.971\.892\.363\.969\.460\.843\.35\.049\.370\.835\.150\.4238\.469\.8138\.462\.289\.460\.649\.4LEACE1\.057\.170\.611\.572\.320\.571\.811\.666\.119\.760\.725\.11\.549\.670\.614\.072\.026\.071\.914\.166\.123\.760\.529\.62\.048\.470\.615\.971\.833\.171\.916\.266\.028\.060\.434\.12\.548\.070\.617\.571\.439\.972\.017\.366\.130\.960\.338\.23\.048\.170\.619\.370\.753\.572\.019\.166\.133\.660\.341\.64\.048\.470\.523\.064\.5115\.172\.123\.166\.138\.659\.949\.25\.049\.770\.327\.456\.4197\.472\.227\.866\.144\.659\.458\.8

Table 23:Model SDXL, erasure of horsehorsemotorcyclecowpigdoglegislatormethodstrengthcs↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowNo Steering\-72\.170\.5\-73\.8\-73\.5\-68\.1\-60\.4\-CASteer1\.070\.870\.121\.374\.136\.473\.728\.267\.829\.260\.221\.22\.052\.270\.945\.372\.093\.074\.149\.367\.444\.259\.836\.33\.051\.369\.3105\.061\.5216\.965\.9261\.365\.571\.959\.358\.44\.056\.762\.3186\.259\.0242\.867\.3175\.362\.5118\.458\.990\.15\.052\.560\.4221\.058\.7249\.965\.0205\.059\.3161\.358\.4130\.3LEACE1\.071\.170\.57\.573\.814\.473\.411\.468\.09\.260\.411\.12\.052\.070\.410\.073\.920\.973\.516\.768\.014\.260\.415\.63\.049\.770\.312\.274\.025\.873\.521\.467\.918\.460\.319\.14\.048\.870\.413\.874\.230\.973\.525\.667\.821\.360\.221\.85\.053\.770\.315\.274\.136\.273\.629\.767\.824\.460\.124\.2

Table 24:Model SANA, erasure of snoopysnoopymickeypikachuspongebobdoglegislatormethodstrengthcs↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowNo Steering\-79\.776\.1\-74\.0\-79\.0\-68\.1\-60\.4\-CASteer1\.060\.675\.364\.074\.141\.379\.043\.968\.042\.160\.823\.32\.046\.070\.5168\.374\.3103\.674\.7146\.668\.074\.361\.138\.23\.042\.464\.2189\.772\.0164\.163\.4222\.267\.7100\.960\.855\.34\.040\.958\.5202\.062\.6204\.255\.7258\.766\.8116\.960\.474\.85\.040\.955\.4208\.155\.5231\.652\.9276\.565\.1127\.660\.094\.8LEACE1\.057\.076\.118\.274\.16\.779\.013\.968\.117\.360\.39\.12\.044\.876\.230\.574\.111\.778\.919\.368\.125\.360\.213\.63\.041\.676\.149\.074\.216\.475\.2200\.568\.032\.260\.216\.84\.040\.975\.673\.474\.221\.378\.729\.068\.038\.060\.119\.55\.041\.474\.4109\.474\.226\.178\.734\.168\.144\.160\.022\.0

Table 25:Model SANA, erasure of chihuahuachihuahuamuffindogwolfcatlegislatormethodstrengthcs↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowcs↑\\uparrowfid↓\\downarrowNo Steering\-76\.466\.3\-68\.1\-73\.2\-68\.5\-60\.4\-CASteer1\.075\.666\.619\.867\.449\.473\.625\.668\.332\.555\.3268\.22\.049\.566\.830\.859\.9143\.673\.453\.466\.465\.260\.533\.73\.048\.967\.144\.152\.6214\.464\.6265\.358\.6151\.260\.448\.64\.049\.567\.058\.052\.3223\.862\.0263\.854\.6205\.060\.271\.95\.050\.466\.376\.753\.2233\.559\.8282\.553\.6220\.959\.8102\.3LEACE1\.073\.066\.35\.868\.028\.173\.26\.468\.510\.460\.49\.92\.049\.366\.29\.767\.847\.173\.39\.768\.516\.060\.414\.73\.047\.366\.212\.667\.668\.173\.312\.368\.620\.660\.417\.94\.047\.266\.115\.267\.188\.773\.314\.768\.624\.660\.320\.95\.048\.566\.117\.766\.2113\.673\.316\.868\.727\.560\.323\.0
MidSteer: Optimal Affine Framework for Steering Generative Models

Similar Articles

FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

Don't Lose Focus: Activation Steering via Key-Orthogonal Projections

Predicting Where Steering Vectors Succeed

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

Submit Feedback

Similar Articles

FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models
Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention
Don't Lose Focus: Activation Steering via Key-Orthogonal Projections
Predicting Where Steering Vectors Succeed
SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models