Closed-Loop Neural Activation Control in Vision-Language-Action Models

arXiv cs.AI 06/02/26, 04:00 AM Papers
Summary
Proposes CTRL-STEER, a closed-loop framework for adaptive steering of vision-language-action models using time-varying control signals, achieving better trade-off between concept regulation and task success without retraining.
arXiv:2606.00269v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models can be steered at test time by intervening on semantically meaningful internal directions, but existing methods use a fixed steering coefficient, effectively operating in open loop. This is poorly suited to embodied control, where task state and concept error evolve over time, often causing overcorrection, oscillation, and reduced task success, especially for temporal behaviors such as speed and smoothness. We propose CTRL-STEER, a closed-loop framework that replaces static intervention strength with adaptive, time-varying control signals. The key idea is to decouple representation from regulation: rather than assuming temporal concepts are directly controlled by individual neurons, we steer along motion-aligned residual directions while a feedback controller adjusts intervention magnitude online. We instantiate this framework with both PID and reinforcement learning based controllers. Experiments with a fine-tuned OpenVLA policy on four LIBERO task suites show that CTRL-STEER achieves more stable concept regulation and a better steering-task success trade-off than fixed-coefficient baselines, without modifying or retraining the base model.
Original Article
View Cached Full Text
Cached at: 06/02/26, 03:45 PM
# Closed-Loop Neural Activation Control in Vision-Language-Action Models
Source: [https://arxiv.org/html/2606.00269](https://arxiv.org/html/2606.00269)
Ramneet Kaur SRI International ramneet\.kaur@sri\.comNathaniel D\. Bastian United States Military Academy nathaniel\.bastian@westpoint\.eduOlivera Kotevska Oak Ridge National Laboratory kotevskao@ornl\.govSusmit Jha SRI International susmitjha@berkeley\.eduYanzhao Wu Florida International University yawu@fiu\.eduSumit Kumar Jha University of Florida sumit\.jha@ufl\.eduAnirban Roy SRI International anirban\.roy@sri\.com

###### Abstract

Vision\-Language\-Action \(VLA\) models can be steered at test time by intervening on semantically meaningful internal directions, but existing methods use a fixed steering coefficient, effectively operating in open loop\. This is poorly suited to embodied control, where task state and concept error evolve over time, often causing overcorrection, oscillation, and reduced task success, especially for temporal behaviors such as speed and smoothness\. We propose CTRL\-STEER, a closed\-loop framework that replaces static intervention strength with adaptive, time\-varying control signals\. The key idea is to decouple representation from regulation: rather than assuming temporal concepts are directly controlled by individual neurons, we steer along motion\-aligned residual directions while a feedback controller adjusts intervention magnitude online\. We instantiate this framework with both PID and reinforcement learning based controllers\. Experiments with a fine\-tuned OpenVLA policy on four LIBERO task suites show that CTRL\-STEER achieves more stable concept regulation and a better steering\-task success trade\-off than fixed\-coefficient baselines, without modifying or retraining the base model\.

## 1Introduction

We propose a framework for controlled steering of vision\-language\-action \(VLA\) models\[[18](https://arxiv.org/html/2606.00269#bib.bib17),[5](https://arxiv.org/html/2606.00269#bib.bib20)\]for inference\-time adaptation\. VLA models jointly process visual observations and language instructions to generate low\-level control actions for robotic agents, unifying perception, semantic reasoning, and control within an end\-to\-end trainable architecture\. A typical VLA model combines a pretrained vision\-language model \(VLM\) as a perceptual\-semantic backbone with an action decoder that generates action sequences conditioned on the task instruction\[[35](https://arxiv.org/html/2606.00269#bib.bib19)\]\. The VLM encodes scene context and task intent, while the action head autoregressively predicts action tokens\. Such models exhibit strong semantic generalization, enabling transfer across object categories and robust interpretation of novel natural\-language instructions by leveraging pretrained vision\-language representations\[[15](https://arxiv.org/html/2606.00269#bib.bib29),[7](https://arxiv.org/html/2606.00269#bib.bib36),[5](https://arxiv.org/html/2606.00269#bib.bib20)\]\. However, they remain limited in their ability to generalize across physical and geometric variations, including speed \(fast vs\. slow\), spatial attributes \(height, corners, edges\), and fine\-grained configuration changes\[[32](https://arxiv.org/html/2606.00269#bib.bib28),[25](https://arxiv.org/html/2606.00269#bib.bib37),[13](https://arxiv.org/html/2606.00269#bib.bib38),[8](https://arxiv.org/html/2606.00269#bib.bib39),[4](https://arxiv.org/html/2606.00269#bib.bib41),[30](https://arxiv.org/html/2606.00269#bib.bib40),[17](https://arxiv.org/html/2606.00269#bib.bib42)\]\. These factors require precise grounding in continuous control and system dynamics, where current VLA architectures still struggle\.

To address this limitation, we propose a controlled steering framework for VLA models that explicitly balances task completion with steering objectives\. As illustrated in[Fig\.2](https://arxiv.org/html/2606.00269#S1.F2)\(Left\), our method steers the robot arm to move higher while preserving task execution relative to the baseline VLA policy\. We formulate this as an inference\-time adaptation problem that does not require retraining the underlying VLA model\. This design is motivated by the cost and limited scalability of collecting training trajectories that cover diverse physical configurations, especially in novel environments\[[27](https://arxiv.org/html/2606.00269#bib.bib30),[24](https://arxiv.org/html/2606.00269#bib.bib31),[31](https://arxiv.org/html/2606.00269#bib.bib32),[34](https://arxiv.org/html/2606.00269#bib.bib33),[29](https://arxiv.org/html/2606.00269#bib.bib34)\]\. Steering VLA models introduces two core challenges: \(1\) identifying neurons associated with a desired steering instruction, and \(2\) modulating those neurons to influence predicted actions without compromising task success\. The first challenge lies in mechanistic interpretability\[[11](https://arxiv.org/html/2606.00269#bib.bib1)\], whose goal is to connect semantic concepts to internal neural representations\. This is inherently difficult because neurons are often polysemantic, meaning that interpretable features may be distributed across multiple units rather than localized to a single neuron\[[9](https://arxiv.org/html/2606.00269#bib.bib5)\]\. Once such neurons are identified, the second challenge is to regulate their activations so as to produce the desired behavioral change while preserving successful task execution\. Existing methods\[[12](https://arxiv.org/html/2606.00269#bib.bib3),[16](https://arxiv.org/html/2606.00269#bib.bib4)\]typically adopt an open\-loop strategy, scaling neuron activations with fixed coefficients to induce target behaviors, but without accounting for the downstream impact on task completion\. As a result, improved steering often comes at the expense of degraded task success, exposing the limitations of open\-loop intervention when both objectives must be satisfied simultaneously\. This is illustrated in[Fig\.2](https://arxiv.org/html/2606.00269#S1.F2)\(Right\), where steering improves the target behavior but often leads to unsuccessful execution unless task completion is explicitly incorporated\. To overcome these challenges, we propose CTRL\-STEER, a framework that first identifies neurons associated with continuous\-control\-relevant concepts \(e\.g\., motion attributes\), and then performs closed\-loop, task\-aware steering of their activations during action generation\.

To identify relevant neurons, we build on the mechanistic interpretability approach of\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]for single control features such as up, down, left, right, forward, and backward\. Specifically, we project each neuron’s value vector in the feed\-forward layer into vocabulary space and derive a semantic embedding that captures its dominant directional meaning\. We then select neurons whose embeddings are most similar to a representativeset of featuresassociated with the control concept of interest, yielding a candidate steering set\. This differs from the underlying approach\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\], which performs steering on neurons identified for an individual feature rather than over a set of features relevant to a broader control concept\.

For closed\-loop steering, we employ a proportional\-integral\-derivative \(PID\) controller\[[1](https://arxiv.org/html/2606.00269#bib.bib10)\]to regulate neuron activations toward a desired steering objective\. Concretely, the PID controller adjusts the scaling factors applied to the identified neurons, thereby shaping the generated action sequence online\. Its proportional term corrects instantaneous error, the integral term compensates for accumulated past deviations, and the derivative term responds to the rate of change of the error\. While this feedback mechanism improves alignment with the target behavior by incorporating historical error information, it does not explicitly reason about future task outcomes\. Consequently, although PID improves steering performance, task success can remain suboptimal\. To introduce planning into the closed\-loop framework, we further propose a reinforcement learning \(RL\)\-based controller\[[19](https://arxiv.org/html/2606.00269#bib.bib44)\]\. Specifically, we train a proximal policy optimization \(PPO\)\[[26](https://arxiv.org/html/2606.00269#bib.bib45)\]policy using a reward that jointly captures steering compliance and task success\. Unlike PID, the RL controller learns a nonlinear control policy that adaptively modulates neuron activations over time, enabling coordinated optimization of both steering objectives and overall task performance\. As shown in[Fig\.1](https://arxiv.org/html/2606.00269#S1.F1), open\-loop control suffers from a fundamental steering\-success trade\-off, whereas closed\-loop steering achieves stronger task success while maintaining the desired steering behavior\.

![Refer to caption](https://arxiv.org/html/2606.00269v1/images/heightplots.png)Figure 1:We present the effect of steering height for the task‘pick up the book and place it in the back compartment of the caddy’\. The unsteered model keeps the end\-effector lower\. Static steering increases the height, but fails to achieve a high task success rate\. Closed\-loop steering maintains a higher end\-effector trajectory while still completing the task\.We make the following contributions for steering VLA models\.

1. 1\.Closed\-Loop Steering\.We formulate activation steering in VLA models as a closed\-loop control problem\. Rather than steering individual feature neurons for a continuous control concept using a fixed coefficient, we regulate a set of concept\-related features using a PID controller that dynamically adjusts steering coefficients at each timestep\.
2. 2\.RL\-based Closed\-Loop Steering\.We introduce an RL\-based controller that learns a nonlinear policy for modulating neuron activations over time, jointly optimizing concept steering and task completion\.
3. 3\.Experimental Evaluation\.Experiments on the LIBERO benchmark show that the closed\-loop PID controller maintains task success close to the unsteered baseline \(71\.37%\), whereas static steering with strong interventions reduces success to as low as 1\.8%\. The RL controller further improves success to 73\.88% for height steering and 76\.12% for speed steering while maintaining the desired steering behavior\.

![Refer to caption](https://arxiv.org/html/2606.00269v1/images/introduction2.png)Figure 2:Left: Example of steering the height concept for the task:Put the yellow and white mug in the microwave and close it\.The goal is to steer the arm to the desired direction, e\.g\., a larger height\.Right: i\. The unsteered VLA model follows a low trajectory and fails to place the mug correctly\. ii\. Static steering of VLA drastically increases the trajectory height and collides with the microwave top\. iii\. CTRL\-STEER \- our closed\-loop RL\-based steering dynamically adjusts neuron coefficients, achieving sufficient vertical clearance for successful placement while enabling correct door closure\.
## 2Related Work

Vision\-Language\-Action \(VLA\) models have recently emerged as a powerful paradigm for general\-purpose robotic policies by unifying visual perception, language understanding, and action generation within a single model\[[18](https://arxiv.org/html/2606.00269#bib.bib17),[35](https://arxiv.org/html/2606.00269#bib.bib19)\]\. Early work showed that pretrained vision\-language models \(VLMs\) can be adapted for robot control by discretizing continuous actions into token sequences, thereby enabling autoregressive action generation\[[35](https://arxiv.org/html/2606.00269#bib.bib19)\]\. Building on this idea, large\-scale systems such as OpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]andπ0\\pi\_\{0\}\[[5](https://arxiv.org/html/2606.00269#bib.bib20)\]combine pretrained VLM backbones with robot interaction data, reflecting a broader shift toward foundation models for robotics\. By leveraging large pretrained language backbones, such as LLaMA\[[28](https://arxiv.org/html/2606.00269#bib.bib6)\], together with visual encoders, these models enable semantic reasoning to directly shape control behavior\.

Mechanistic Interpretability\.Mechanistic interpretability aims to understand neural networks by analyzing their internal representations and identifying semantically meaningful computational components\[[11](https://arxiv.org/html/2606.00269#bib.bib1),[23](https://arxiv.org/html/2606.00269#bib.bib25)\]\. In transformer architectures, feedforward layers have been interpreted as key\-value memories, where individual neurons correspond to interpretable features in vocabulary space\[[11](https://arxiv.org/html/2606.00269#bib.bib1),[10](https://arxiv.org/html/2606.00269#bib.bib2)\]\. More recently, sparse autoencoder methods have been proposed to decompose activations into more monosemantic directions\[[6](https://arxiv.org/html/2606.00269#bib.bib26),[9](https://arxiv.org/html/2606.00269#bib.bib5)\]\. Together, these findings suggest that high\-level semantic concepts can often be localized within specific activation subspaces of large models, providing a basis for interpretable intervention\.

Activation\-Level Steering and Representation Engineering\.Building on these interpretability insights, recent work has explored activation\-level steering as a means of modulating model behavior at inference time\. In language models, representation engineering methods identify latent directions associated with high\-level concepts and steer model outputs by injecting or suppressing those directions\[[22](https://arxiv.org/html/2606.00269#bib.bib27),[36](https://arxiv.org/html/2606.00269#bib.bib8)\]\. Extending this idea to robotics,\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]introduced a method for steering VLA models by identifying semantically aligned feedforward neurons and modulating their activations\. Their results demonstrated zero\-shot behavioral modulation in both simulation and real\-world robots, indicating that control\-relevant semantic structure persists within pretrained representations\. However, these interventions are typically open\-loop: steering coefficients are fixed or manually chosen and remain constant throughout execution\. Consequently, they do not account for task\-level feedback or the evolving tradeoff between concept steering and task completion\.

Feedback Control in Robotics\.In contrast, classical robotics systems rely on closed\-loop control to regulate behavior under uncertainty\. Controllers such as PID and reinforcement learning \(RL\)\-based policies adapt control signals online using observed errors and task objectives\[[3](https://arxiv.org/html/2606.00269#bib.bib24),[2](https://arxiv.org/html/2606.00269#bib.bib9)\]\. These feedback mechanisms enable agents to respond to environmental variation, correct deviations during execution, and balance competing objectives more effectively\.

Our work connects activation\-level steering with classical feedback control by formulating neuron\-level concept modulation in VLA models as a closed\-loop control problem\. Rather than applying fixed steering coefficients, we introduce adaptive controllers that regulate internal activations during inference, enabling a better balance between semantic concept alignment and task success\.

## 3Approach

We formulate steering in VLA models as a closed\-loop control problem, where concept\-relevant neuron activations are dynamically regulated to induce desired motion behaviors while preserving task performance\. Our method consists of two components\. First, we identify neurons associated with control\-relevant features using a mechanistic interpretability procedure\. Second, we introduce two closed\-loop controllers—a PID controller and a reinforcement learning \(RL\) controller—that adaptively modulate these neurons during inference to balance steering and task execution\. An overview of the framework is shown in[Fig\.3](https://arxiv.org/html/2606.00269#S3.F3)\.

![Refer to caption](https://arxiv.org/html/2606.00269v1/images/front-image.png)Figure 3:We present CTRL\-STEER for controlled steering of VLA models\. First, we identify feature\-aligned neurons by projecting FFN value vectors into token space and selecting neurons whose semantic embeddings align with representative control features\. During inference, these neurons are modulated through activation intervention\. A closed\-loop controller then adaptively regulates the steering signal using environmental feedback and task rewards, enabling dynamic behavior control while maintaining high task success during VLA execution\.### 3\.1Mechanistic Interpretability of Temporal Concepts

Our mechanistic interpretability approach builds on the framework of Haon et al\.\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\], which identifies feed\-forward network \(FFN\) neurons associated with a single feature corresponding to a control concept\.

##### FFN decomposition\[[11](https://arxiv.org/html/2606.00269#bib.bib1)\]\.

Consider a transformer block at layerℓ\\ell, and letrℓ∈ℝdr^\{\\ell\}\\in\\mathbb\{R\}^\{d\}denote the residual\-stream representation entering the FFN sublayer, whereddis the residual dimension\. The FFN produces an updateFFNℓ\(rℓ\)∈ℝd\\mathrm\{FFN\}^\{\\ell\}\(r^\{\\ell\}\)\\in\\mathbb\{R\}^\{d\}\. Following the standard two\-layer FFN structure, this update can be written as

FFNℓ\(rℓ\)=∑i=1dmmiℓviℓ,\\displaystyle\\mathrm\{FFN\}^\{\\ell\}\(r^\{\\ell\}\)=\\sum\_\{i=1\}^\{d\_\{m\}\}m\_\{i\}^\{\\ell\}\\,v\_\{i\}^\{\\ell\},\(1\)wheredmd\_\{m\}is the intermediate FFN width,miℓ∈ℝm\_\{i\}^\{\\ell\}\\in\\mathbb\{R\}is the input\-dependent activation of neuroniiin layerℓ\\ell, andviℓ∈ℝdv\_\{i\}^\{\\ell\}\\in\\mathbb\{R\}^\{d\}is its input\-independent value vector\. Each neuron\(ℓ,i\)\(\\ell,i\)is therefore associated with a unique value vectorviℓv\_\{i\}^\{\\ell\}, which determines the direction of its contribution to the output residual stream\. Because these value vectors lie in the same space as the transformer residual stream, they can be projected directly into vocabulary\-logit space through the model output head\.

##### Semantic characterization of neurons\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]\.

To assign semantic meaning to individual neurons, we analyze their value vectors in vocabulary space\. LetWout∈ℝ\|𝒱\|×dW\_\{\\mathrm\{out\}\}\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\\times d\}denote the LM head that maps residual representations to logits over the vocabulary𝒱\\mathcal\{V\}, where\|𝒱\|=32,000\|\\mathcal\{V\}\|=32\{,\}000for OpenVLA\. For each value vectorviℓv\_\{i\}^\{\\ell\}, we compute

ziℓ=Woutviℓ∈ℝ\|𝒱\|,\\displaystyle z\_\{i\}^\{\\ell\}=W\_\{\\mathrm\{out\}\}v\_\{i\}^\{\\ell\}\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\},\(2\)and convert it to a probability distribution

piℓ=softmax\(ziℓ\)\.\\displaystyle p\_\{i\}^\{\\ell\}=\\mathrm\{softmax\}\(z\_\{i\}^\{\\ell\}\)\.\(3\)This distribution reflects which tokens are promoted when the directionviℓv\_\{i\}^\{\\ell\}is added to the residual stream\. We then select the top\-kktokens frompiℓp\_\{i\}^\{\\ell\}, withk=20k=20, and compute their embeddings using the model text encoder\. Lete\(w\)∈ℝdee\(w\)\\in\\mathbb\{R\}^\{d\_\{e\}\}denote the embedding of tokenww\. The semantic embedding of neuron\(ℓ,i\)\(\\ell,i\)is defined as the probability\-weighted average

semiℓ=∑w∈TopK\(piℓ\)piℓ\(w\)e\(w\),\\displaystyle sem\_\{i\}^\{\\ell\}=\\sum\_\{w\\in\\mathrm\{TopK\}\(p\_\{i\}^\{\\ell\}\)\}p\_\{i\}^\{\\ell\}\(w\)\\,e\(w\),\(4\)which captures the dominant linguistic meaning associated withviℓv\_\{i\}^\{\\ell\}\.

Unlike prior work\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\], which identifies neurons for a single feature, our objective is to construct a steering set aligned with a broader control\-relevant concept expressed through multiple related features\. We therefore use the semantic embeddingssemiℓsem\_\{i\}^\{\\ell\}to retrieve neurons whose value directions consistently align with a representative feature set for the target concept\.

##### Concept\-to\-neuron matching\.

Given a target control concept𝒞\\mathcal\{C\}, we define a set of representative feature tokens𝒯𝒞⊂𝒱\\mathcal\{T\}\_\{\\mathcal\{C\}\}\\subset\\mathcal\{V\}that express the concept\. For motion, we use \{up,down,left,right,forward,backward\}\. For each tokenw∈𝒯𝒞w\\in\\mathcal\{T\}\_\{\\mathcal\{C\}\}, we compute its embeddinge\(w\)e\(w\)and perform akk\-nearest\-neighbor \(k\-NN\) search over the set\{semiℓ\}\\\{sem\_\{i\}^\{\\ell\}\\\}across all FFN layers using cosine similarity\. For each representative token, thekkmost similar neurons are selected, and the final candidate set is defined as

𝒮=⋃w∈𝒯𝒞kNNk\(e\(w\);\{semiℓ\}ℓ,i\)\.\\displaystyle\\mathcal\{S\}=\\bigcup\_\{w\\in\\mathcal\{T\}\_\{\\mathcal\{C\}\}\}\\mathrm\{kNN\}\_\{k\}\\\!\\left\(e\(w\)\\,;\\,\\\{sem\_\{i\}^\{\\ell\}\\\}\_\{\\ell,i\}\\right\)\.\(5\)Although the search spans all layers, we empirically observe that the selected neurons are concentrated mostly in the latter half of the transformer, suggesting stronger semantic alignment in higher layers\.

##### Polysemantic filtering\.

Transformer neurons are often polysemantic, meaning that a single neuron may capture multiple unrelated features\[[9](https://arxiv.org/html/2606.00269#bib.bib5)\]\. We therefore refine𝒮\\mathcal\{S\}by manually inspecting the top\-kkpromoted tokens for each candidate neuron\. Neurons whose promoted tokens reflect unrelated or conflicting concepts are removed\. After filtering, we retain the 10 neurons with the strongest and most consistent semantic alignment, forming the final intervention set𝒮\\mathcal\{S\}\.

Control concepts in robotic policies can be broadly divided intostate\-basedandtemporalconcepts\. State\-based concepts correspond to instantaneous properties such as end\-effector height, object position, or spatial relation\. These can often be represented as directions in the residual stream, which motivates existing approaches based on neuron selection and static steering through value\-vector alignment\. Temporal concepts, by contrast, such as speed and acceleration, depend on the evolution of behavior across timesteps\. Since the VLA policy is invoked independently at each timestep and computes residual updates only from the current observation, these temporal quantities cannot be computed within a single forward pass\. As a result, correlation between a value vector and a temporal metric does not imply direct controllability\. This motivates our formulation of steering as a closed\-loop control problem in which intervention strength is regulated dynamically during execution\.

### 3\.2Steering as a Closed\-Loop Control Problem

Prior work\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]intervenes on selected neurons by replacing their activation coefficients with a constant scalarα∈ℝ\\alpha\\in\\mathbb\{R\}:

m~iℓ=\{αif\(ℓ,i\)∈𝒮,miℓotherwise\.\\displaystyle\\tilde\{m\}\_\{i\}^\{\\ell\}=\\begin\{cases\}\\alpha&\\text\{if \}\(\\ell,i\)\\in\\mathcal\{S\},\\\\ m\_\{i\}^\{\\ell\}&\\text\{otherwise\}\.\\end\{cases\}\(6\)This induces a residual shift in the span of\{viℓ:\(ℓ,i\)∈𝒮\}\\\{v\_\{i\}^\{\\ell\}:\(\\ell,i\)\\in\\mathcal\{S\}\\\}, thereby biasing the action\-token distribution toward the target concept in an open\-loop manner\. While constant activation interventions can produce steering, they often degrade task completion because the steering coefficient does not adapt to the evolving state of the environment\. Such static modulation is especially ineffective for temporally varying attributes such as speed or smoothness, where execution history and accumulated deviation matter\.

We therefore pose steering as a closed\-loop control problem\. Letctc^\{t\}denote the value of a target concept at timesteptt, and define the steering error as

et=c∗−ct,\\displaystyle e^\{t\}=c^\{\*\}\-c^\{t\},\(7\)wherec∗c^\{\*\}is the desired target value for the concept\. The specific choice ofc∗c^\{\*\}depends on the steering objective; for example, for height steering we setc∗c^\{\*\}to twice the initial height\. Instead of using a constant coefficientα\\alpha, we estimate a time\-varying steering vector𝜶t\\bm\{\\alpha\}^\{t\}, with separate steering coefficients for the selected neurons\. The closed\-loop steering problem is formulated as

𝜶t=arg⁡min𝜶∈ℛk∑τ=t\+1Teτ,\\bm\{\\alpha\}^\{t\}=\\underset\{\\bm\{\\alpha\}\\in\\mathcal\{R\}^\{k\}\}\{\\arg\\min\}\\sum\_\{\\tau=t\+1\}^\{T\}e^\{\\tau\},\(8\)where𝜶it\\bm\{\\alpha\}^\{t\}\_\{i\}denotes the steering coefficient for theii\-th neuron, andTTis the control horizon\.

#### 3\.2\.1PID\-Based Control\.

We first introduce a proportional\-integral\-derivative \(PID\) controller to compute the time\-varying steering signal𝜶PIDt\\bm\{\\alpha\}\_\{PID\}^\{t\}from the steering error in[Eq\.7](https://arxiv.org/html/2606.00269#S3.E7)\. The controller is defined as

𝜶PIDt=KP⋅et\+KI∑τ=0teτ\+KD\(et−et−1\),\\bm\{\\alpha\}^\{t\}\_\{PID\}=K\_\{P\}\\cdot e^\{t\}\+K\_\{I\}\\sum\_\{\\tau=0\}^\{t\}e^\{\\tau\}\+K\_\{D\}\(e^\{t\}\-e^\{t\-1\}\),\(9\)where the three terms correspond to the proportional, integral, and derivative components, scaled by coefficientsKPK\_\{P\},KIK\_\{I\}, andKDK\_\{D\}, respectively\. The PID controller is independent of the neurons and hence outputs a scalar signal which is applied to all the neurons\.

Proportional term\.The proportional term adjusts neuron activations according to the instantaneous concept error\. It scales the residual shift along the steering directions in proportion to the current deviation from the desired behavior, enabling rapid correction when the trajectory departs from the target concept\.

Integral term\.The integral term accumulates past errors to compensate for persistent bias\. Because the base VLA model is optimized for its original training objective, the steering objective may conflict with its learned dynamics, resulting in steady\-state deviations\. The proportional term alone cannot eliminate such bias\. The integral term progressively strengthens the intervention until the accumulated error is reduced\.

Derivative term\.The derivative term responds to the rate of change of the steering error\. When deviations increase rapidly, it amplifies the control signal to accelerate correction; when the error decreases quickly, it attenuates the intervention to avoid overshoot\. Since steering primarily affects motion\-related behavior, abrupt changes in neuron scaling can induce oscillatory trajectories\. The derivative term helps suppress such oscillations and improves stability\.

Together, these components provide stable and responsive closed\-loop regulation of the selected steering neurons\. To avoid excessive residual shifts, we constrain𝜶PIDt∈\[0,20\]\\bm\{\\alpha\}^\{t\}\_\{PID\}\\in\[0,20\], based on empirical observations that stronger interventions degrade task performance\. The gainsKPK\_\{P\},KIK\_\{I\}, andKDK\_\{D\}are chosen conservatively to avoid over\-amplification, especially in the presence of noisy error signals that can disproportionately affect the derivative term\.

#### 3\.2\.2RL\-Based Control

Although the PID controller adapts neuron activations using past error, it remains fundamentally reactive and does not explicitly optimize long\-horizon task success\. As a result, it can improve steering behavior without fully preserving task completion\. To address this limitation, we introduce a reinforcement learning \(RL\)\-based controller that performs long\-term planning and jointly optimizes steering compliance and task success\. Rather than modulating neurons for a single isolated feature, the RL controller operates over a set of neurons associated with a broader collection of motion\-related concepts\. This allows it to model nonlinear dependencies across concept\-aligned neurons and coordinate their activations more effectively\.

We formulate steering as a reinforcement learning problem in which a policy predicts neuron activations that induce the desired behavior\. At timesteptt, the controller state is defined as

st=\[at,Δat,αt−1,tT\],\\displaystyle s\_\{t\}=\[a\_\{t\},\\Delta a\_\{t\},\\alpha^\{t\-1\},\\dfrac\{t\}\{T\}\],\(10\)whereat∈ℝka\_\{t\}\\in\\mathbb\{R\}^\{k\}denotes the activations of thekkselected neurons,Δat=at−at−1\\Delta a\_\{t\}=a\_\{t\}\-a\_\{t\-1\}captures the immediate change in neuron activations,αt−1∈ℝk\\alpha^\{t\-1\}\\in\\mathbb\{R\}^\{k\}is the steering signal applied at the previous timestep, andt/Tt/Tis the normalized episode progress\.

The policyπθ\\pi\_\{\\theta\}outputs

𝜶RLt=πθ\(st\),\\displaystyle\\bm\{\\alpha\}^\{t\}\_\{RL\}=\\pi\_\{\\theta\}\(s\_\{t\}\),\(11\)where𝜶RLt\\bm\{\\alpha\}^\{t\}\_\{RL\}is akk\-dimensional vector assigning steering values to the selected neurons\. These values directly modulate FFN activations and thereby steer the VLA toward the desired concept\. The reward function is designed to balance steering quality and task success:

rt=rsteer\(t\)\+λ⋅rtask,\\displaystyle r\_\{t\}=r\_\{steer\}\(t\)\+\\lambda\\cdot r\_\{task\},\(12\)wherersteerr\_\{steer\}measures instantaneous steering metric, such as height or instantaneous velocity, andrtaskr\_\{task\}is the task\-completion reward\. The coefficientλ\\lambdadetermines the trade\-off between the two objectives\.

We train a separate RL policy for each task\. For a given task, the policyπθ\\pi\_\{\\theta\}is optimized using rollouts collected from the corresponding environment\. We initialize the policy using the steering signal𝜶PIDt\\bm\{\\alpha\}^\{t\}\_\{PID\}generated by the PID controller, and then further refine it through environment interaction using rewards that jointly encode concept alignment and task completion\. This task\-specific optimization enables the controller to adapt neuron activations to the dynamics of each task while balancing steering objectives with successful execution\.

#### 3\.2\.3Implementation Details\.

We use OpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]that is trained on prismatic VLM\[[14](https://arxiv.org/html/2606.00269#bib.bib47)\], that uses LLaMA\-2\[[28](https://arxiv.org/html/2606.00269#bib.bib6)\]as the language backbone\. For neuron identification, we consider all FFN layers in the LLaMA\-2 decoder and select 10 neurons associated with the motion\-related concept\. In the concept\-to\-neuron matching step, we setk=5k=5for the k\-NN search\. For height steering, the target value is set toc∗=2h0c^\{\*\}=2h\_\{0\}, whereh0h\_\{0\}is the initial end\-effector height\. For speed steering, we setc∗=30c^\{\*\}=30cm/s\. For the PID controller, we use gainsKP=4\.0K\_\{P\}=4\.0,KI=0\.5K\_\{I\}=0\.5, andKD=1\.0K\_\{D\}=1\.0with a control horizon of 20 steps\. For the RL\-based controller, we use PPO for policy optimization\. Each trajectory has a horizon ofT=920T=920steps, consisting of 20 environment warm\-up steps followed by 900 execution steps for all tasks in the LIBERO benchmark\.

## 4Experiments

Benchmark\.We evaluate our method on the LIBERO benchmark and BridgeData V2\[[30](https://arxiv.org/html/2606.00269#bib.bib40)\]\. We consider all four task suites from the LIBERO\[[21](https://arxiv.org/html/2606.00269#bib.bib16)\]benchmark: LIBERO\-Goal, LIBERO\-Object, LIBERO\-Spatial, and LIBERO\-Long\. LIBERO\-Goal has 10 tasks that focus on direct goal manipulation, LIBERO\-Object has 10 tasks focusing on object\-centric variations, and LIBERO\-Spatial has 10 tasks considering spatial relations\. For LIBERO\-Long, we use the libero\-10 subset that consists of long\-horizon sequential tasks\. From BridgeData V2, we evaluated on pick\-and\-place tasks on SimplerEnv\[[20](https://arxiv.org/html/2606.00269#bib.bib46)\]These tasks encompass compositional complexity, reasoning, and motion diversity, making it a suitable benchmark for evaluating state\-based and temporal steering of VLAs\.

Table 1:Steering height concept in X\-VLA\[[33](https://arxiv.org/html/2606.00269#bib.bib48)\]model on the LIBERO Goal task suite\. AAT measures the area above the threshold height, showcasing the steerability, and SR shows the task success rate\.MethodHeight95th PercentileAATSR\(%\)Unsteered1\.0411\.0411\.1401\.140161\.58161\.5859%59\\%Static 201\.0201\.0201\.0731\.073626\.74626\.7425%25\\%PID1\.0461\.0461\.0871\.087183\.35183\.3560%60\\%RL1\.0371\.0371\.1231\.123400\.33400\.3360%60\\%

Baseline VLA model\.We use OpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]as the base VLA policy\. OpenVLA is a 7B\-parameter model with llama2 as its backbone\. For each task suite, OpenVLA is fine\-tuned independently to improve its task completion capabilities, without making architectural modifications\. We additionally evaluate on X\-VLA\[[33](https://arxiv.org/html/2606.00269#bib.bib48)\]to showcase the transferability of our method on other VLA architectures\.

Table 2:OpenVLA performance in terms of height and speed steering and success rate \(SR\) on BridgeData V2\.MethodAAT \(height\)SR \(Height\)Avg speedSR \(Speed\)Unsteered39\.1539\.1540%40\\%16\.9340%Static \(20\)47\.9647\.9612\.5%12\.5\\%25\.1814\.9%PID47\.2847\.2846%46\\%18\.9045\.8 %RL48\.1348\.1347\.6%47\.6\\%21\.3848\.9%

Steerable Concepts\.We consider two concepts for steering: \(1\)Heightis chosen as a state\-based concept determined by the instantaneous height of the end\-effector\. Height represents a validation case for spatial concepts, where we expect our framework to perform well, in addition to temporal concepts\. \(2\)Speedof the end\-effector across time steps is chosen as the temporal control concept\. Metrics\.We evaluate steering using concept\-specific metrics\. For height, we report average end\-effector height\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\],95th95^\{th\}percentile height, and the Area Above Threshold \(AAT\)\. For all metrics, height is measured in meters\. For speed, we report the average end\-effector speed\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]and the Speed Above Threshold \(SAT\), with all speed values measured in cm/s\.

Table 3:Comparative analysis of spatial steering for end\-effector height across LIBERO task suites\. We evaluate the unsteered OpenVLA model against static steering \(C∈\{5,10,20\}C\\in\\\{5,10,20\\\}\) and CTRL\-STEER: our closed\-loop steering approach with PID and RL controllers\. Height \(m\) measures the distance of the end\-effector from ground, AAT represents the Area Above Threshold, and SR \(%\) denotes the task success rate\. CTRL\-STEER demonstrates a trade\-off between concept adherence and success rate\.Task SuiteMethodHeight \(m\)95th95^\{th\}percentileAATSR \(%\)LIBERO LONGOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.832±\\pm0\.2490\.940±\\pm0\.259181\.655±\\pm107\.03458\.00C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.816±\\pm0\.2520\.923±\\pm0\.260169\.030±\\pm113\.95352\.50C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.817±\\pm0\.2500\.924±\\pm0\.258192\.421±\\pm128\.94051\.00C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.723±\\pm0\.2130\.841±\\pm0\.221201\.075±\\pm136\.05210\.00CTRL\-STEER\(PID\)0\.828±\\pm0\.2490\.937±\\pm0\.262184\.498±\\pm130\.82554\.00CTRL\-STEER\(RL\)0\.804±\\pm0\.2520\.914±\\pm0\.260181\.858±\\pm122\.34457\.00LIBERO GOALOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.056±\\pm0\.0531\.194±\\pm0\.052107\.764±\\pm89\.06177\.50C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.060±\\pm0\.0521\.196±\\pm0\.046104\.280±\\pm74\.87979\.00C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.057±\\pm0\.0511\.197±\\pm0\.04698\.358±\\pm71\.68477\.00C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.041±\\pm0\.0531\.187±\\pm0\.05995\.922±\\pm91\.42941\.00CTRL\-STEER\(PID\)1\.058±\\pm0\.0491\.194±\\pm0\.045108\.788±\\pm98\.25079\.00CTRL\-STEER\(RL\)1\.062±\\pm0\.0521\.200±\\pm0\.04699\.779±\\pm61\.45282\.00LIBERO OBJECTOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.212±\\pm0\.0320\.293±\\pm0\.0274\.440±\\pm5\.10772\.00C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.217±\\pm0\.0260\.293±\\pm0\.0254\.769±\\pm5\.36472\.50C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.211±\\pm0\.0290\.293±\\pm0\.0284\.661±\\pm6\.69373\.00C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.203±\\pm0\.0340\.300±\\pm0\.0312\.596±\\pm2\.39033\.50CTRL\-STEER\(PID\)0\.210±\\pm0\.0330\.295±\\pm0\.0294\.496±\\pm6\.32476\.00CTRL\-STEER\(RL\)0\.211±\\pm0\.0300\.292±\\pm0\.0264\.671±\\pm6\.22577\.00LIBERO SPATIALOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.068±\\pm0\.0511\.192±\\pm0\.04190\.940±\\pm27\.23678\.00C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.071±\\pm0\.0511\.189±\\pm0\.04087\.748±\\pm21\.59079\.00C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.073±\\pm0\.0491\.189±\\pm0\.040110\.937±\\pm110\.62884\.00C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.058±\\pm0\.0461\.186±\\pm0\.035127\.337±\\pm90\.29625\.00CTRL\-STEER\(PID\)1\.068±\\pm0\.0471\.187±\\pm0\.03792\.928±\\pm58\.66375\.00CTRL\-STEER\(RL\)1\.072±\\pm0\.0521\.191±\\pm0\.04197\.970±\\pm70\.17379\.50

Table 4:Comparative analysis of temporal steering for end\-effector speed across LIBERO task suites\. We evaluate the performance of the unsteered OpenVLA baseline against static steering \(C∈\{5,10,20\}C\\in\\\{5,10,20\\\}\) and CTRL\-STEER: our closed\-loop steering framework utilizing PID and RL controllers\. Speed \(cm/s\) denotes average speed of the end effector, SAT represents the Speed Above Threshold metric to isolate active motion, and SR \(%\) denotes the task success rate\. CTRL\-STEER provides stable speed regulation while significantly mitigating the success rate degradation observed in high\-coefficient static interventions\.TASK SUITEMETHODSpeed \(cm/s\)SAT \(cm/s\)SR \(%\)LIBERO LONGOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]11\.255±\\pm2\.7012\.688±\\pm1\.30358\.00C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]11\.297±\\pm2\.6732\.677±\\pm1\.45253\.50C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]11\.111±\\pm2\.9112\.773±\\pm1\.31744\.00C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]12\.700±\\pm0\.3204\.270±\\pm0\.6601\.50CTRL\-STEER\(PID\)10\.877±\\pm3\.0602\.603±\\pm1\.34059\.50CTRL\-STEER\(RL\)11\.037±\\pm2\.8432\.535±\\pm1\.21066\.50LIBERO GOALOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]14\.022±\\pm4\.5533\.651±\\pm1\.23577\.50C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.147±\\pm4\.1143\.579±\\pm1\.41880\.50C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]13\.229±\\pm4\.6003\.587±\\pm1\.27959\.00C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.516±\\pm2\.9983\.856±\\pm1\.3042\.50CTRL\-STEER\(PID\)13\.862±\\pm4\.6343\.624±\\pm1\.21676\.00CTRL\-STEER\(RL\)14\.040±\\pm4\.3963\.488±\\pm1\.38883\.00LIBERO OBJECTOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]14\.156±\\pm3\.1322\.126±\\pm0\.38072\.00C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.712±\\pm2\.8192\.250±\\pm0\.41276\.00C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]15\.180±\\pm2\.5022\.301±\\pm0\.38163\.00C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]10\.160±\\pm3\.6522\.468±\\pm0\.1392\.50CTRL\-STEER\(PID\)14\.665±\\pm2\.6982\.189±\\pm0\.44376\.50CTRL\-STEER\(RL\)14\.111±\\pm3\.1492\.171±\\pm0\.46676\.50LIBERO SPATIALOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]17\.519±\\pm2\.6953\.569±\\pm0\.77278\.00C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]17\.461±\\pm3\.0223\.603±\\pm0\.78576\.50C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]17\.042±\\pm2\.9793\.338±\\pm0\.83465\.50C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]11\.855±\\pm0\.8956\.150±\\pm3\.8401\.00CTRL\-STEER\(PID\)17\.636±\\pm2\.6313\.579±\\pm0\.76778\.00CTRL\-STEER\(RL\)17\.642±\\pm2\.8453\.606±\\pm0\.82078\.50

### 4\.1Results\.

Height steering\.Lethth^\{t\}denote the end\-effector height at timestepttin the LIBERO simulator coordinate system\. Following[Eq\.7](https://arxiv.org/html/2606.00269#S3.E7), we setct=htc^\{t\}=h^\{t\}andet=c∗−hte^\{t\}=c^\{\*\}\-h^\{t\}\. For the PID controller,c∗c^\{\*\}is set to twice the initial end\-effector height,2h02h\_\{0\}, ensuring a meaningful upward shift while remaining within the robot workspace\. In unsteered trajectories, the end\-effector never exceeded this bound across all task suites\. We verified that varying the multiplier within\[1\.8h0,2\.5h0\]\[1\.8h\_\{0\},2\.5h\_\{0\}\]does not qualitatively affect the steering–success trade\-off\.

For the RL controller, the instantaneous heighthth^\{t\}is used as the reward at each timestep\. The overall reward follows[Eq\.12](https://arxiv.org/html/2606.00269#S3.E12), combining steering reward with the binary task success signal while exploring trade\-off coefficientsλ∈\{100,200,500,1000\}\\lambda\\in\\\{100,200,500,1000\\\}\. This allows the policy to learn state\-dependent interventions that increase trajectory height while focusing on completing the task\.

We evaluated steering using mean end\-effector height, but found that it is sensitive to low\-height phases occurring during grasping or manipulation near the table\. Prior work\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]instead uses maximum height, which is sensitive to initial configurations where the end\-effector may already start near its highest position\. To better capture sustained elevation, we therefore report two complementary metrics: the95th95^\{th\}percentile height and the area above threshold \(AAT\)\. The95th95^\{th\}percentile reflects the highest operational region of the trajectory while mitigating initial\-position effects\. AAT integrates the end\-effector height above a predefined threshold across the trajectory, capturing how long the arm remains elevated during execution\. The thresholdhthh\_\{th\}is defined as the midpoint between the minimum and maximum reachable end\-effector heights, providing a task\-independent measure of consistent elevation\.[Tab\.3](https://arxiv.org/html/2606.00269#S4.T3)compares our performance with the baseline OpenVLA model and the state\-of\-the\-art steering approach\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]for VLA models with different static activation coefficients \(C=5,10,20C=5,10,20\)\. We also report task\-wise results in the supplementary material\.

Speed Steering\.Here, the aim is to increase the end\-effector’s motion speed during the task execution\. Letsts^\{t\}denote the instantaneous end\-effector speed at timesteptt, representing the concept of interest:ct=stc^\{t\}=s^\{t\}\. For the PID controller, we set the target speed toc∗=30c^\{\*\}=30cm/s, which exceeds the maximum speed observed in unsteered trajectories while remaining within the robot’s feasible operating range\. For the RL controller, the instantaneous speedsts^\{t\}is used as the concept reward\.

Mean end\-effector speed does not reliably capture steering effects because manipulation phases in the task execution \(e\.g\., grasping\) involve extended stationary periods that reduce the average speed\. We, therefore, introduce the speed\-above\-threshold \(SAT\) metric with SAT defined asmean\(st−sthr∣st\>sthr\)\\mathrm\{mean\}\(s^\{t\}\-s\_\{thr\}\\mid s^\{t\}\>s\_\{thr\}\), wheresthrs\_\{thr\}is the speed threshold\. This metric focuses on active motion segments of the trajectory while filtering out slow manipulation phases\. We setsthr=20s\_\{thr\}=20cm/s, which corresponds to the highest speeds observed during stationary manipulation actions such as grasping\.[Tab\.4](https://arxiv.org/html/2606.00269#S4.T4)compares our performance with the baselines\. We also report task\-wise results in the supplementary material\.

To evaluate whether CTRL\-STEER transfers beyond OpenVLA, we additionally apply the same steering framework to X\-VLA\[[33](https://arxiv.org/html/2606.00269#bib.bib48)\]on the LIBERO\-Goal task suite\. As shown in[Tab\.1](https://arxiv.org/html/2606.00269#S4.T1), CTRL\-STEER preserves task success while improving height\-related steering metrics compared to the unsteered X\-VLA baseline\. Static steering achieves a much larger AAT, but substantially reduces task success, showing the same steering–success trade\-off observed for OpenVLA\.

Analysis of the Results\.The results in[Tab\.3](https://arxiv.org/html/2606.00269#S4.T3)and[Tab\.4](https://arxiv.org/html/2606.00269#S4.T4)reveal a fundamental limitation of static activation steering: increasing the steering strength to enforce a target concept often leads to a degradation in task success rate\. Across the tasks, static steering succeeds when larger intervention strengths are applied\. However, this improvement comes at the cost of severe reductions in task success\. For example, when using a strong static intervention \(C=20C=20\), theaverage task success ratedrops from71\.37%for the unsteered model to27\.37%in height steering and further to1\.8%in the speed steering setting\. These results highlight the limitations of the open\-loop steering\. In contrast, our closed\-loop framework maintains task performance close to the unsteered baseline while still enabling effective concept steering\. The PID\-based controller achieves average success rate of71%for height steering and72\.5%for speed steering\. Furthermore, the RL\-based controller improves performance, achieving73\.88%as the average success rate for height steering and76\.12%for speed steering, surpassing the unsteered policy\.

Trade\-off between steering and task success\.A central challenge in activation steering is balancing concept enforcement with task completion\. Overly aggressive steering of a concept can interfere with successful execution\. Our controlled steering framework addresses this by dynamically adjusting the steering coefficient, reducing intervention strength when strict concept adherence conflicts with task success\.

This effect is evident in height steering tasks for the LIBERO\-Goal and LIBERO\-Long suites for the RL\-based controller in CTRL\-STEER, where AAT is lower than the unsteered OpenVLA model but the success rate is maintained\. For instance, as shown in Fig\.[2](https://arxiv.org/html/2606.00269#S1.F2), when placing a cup inside a microwave, static steering causes the robot to lift the cup excessively, leading to collisions and object drops\. CTRL\-STEER mitigates this by decreasing the steering strength during insertion, allowing successful placement\. Although the achieved height is slightly reduced, task completion is preserved\.

A similar trend appears in speed steering and we provide qualitative results in the supplementary material\. In cluttered scenes, excessive speed destabilizes object interactions and reduces success rates\. Controlled steering attenuates the intervention when high speed becomes detrimental, whereas static steering with large values \(e\.g\.,C=20C=20\) maintains high speeds but significantly degrades performance\. These results demonstrate that effective steering requires adaptive regulation of intervention strength rather than constant perturbation, enabling a better trade\-off between concept modulation and task success\.

Reinforcement Learning without PID initialization\.We analyze the impact of PID\-based initialization on RL training\. In our default setting, the RL policy is warm\-started using steering coefficients from the PID controller before further optimization\. When trained from random initialization instead, task success drops from75%to65%, and steering quality \(SAT\) decreases from2\.48to2\.28, underperforming both the unsteered baseline and other steering methods\. These results suggest that PID provides a stable and meaningful initialization, guiding the RL policy toward effective steering dynamics\. Without this warm start, the policy struggles to jointly optimize task completion and concept regulation, resulting in longer training time and degraded performance\. We present detailed results in the supplementary material\.

## 5Conclusion

We introduce a closed\-loop activation steering framework for Vision\-Language\-Action models that enables controllable steering while preserving task performance\. Unlike prior static interventions that degrade execution, we formulate steering as a feedback control problem and dynamically regulate neuron activations using PID and RL\-based controllers\. This adaptive mechanism improves the trade\-off between concept enforcement and task success across LIBERO benchmarks, with the RL controller maintaining performance above the baseline while achieving effective steering\. Our approach has limitations: the RL controller is trained per task, neuron identification still involves partial manual filtering, and evaluation is limited to a small set of steering concepts\. Future work includes developing task\-agnostic controllers, automating neuron disentanglement, and extending to multi\-concept control\. Overall, our results highlight the potential of combining feedback control with activation\-level interventions for improving the reliability and controllability of embodied models\.

## Acknowledgment

The authors acknowledge partial support from NSF \(IIS\-2331908 and OAC\-2530965\), DARPA \(HR00112420004, HR00112490420, and HR00112490424\), DoE \(DE\-SC0024576\), and NAIRR Pilot \(NAIRR250261\)\.

## References

- \[1\]K\. J\. Astrom\(1995\)PID controllers: theory, design, and tuning\.The international society of measurement and control\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p4.1)\.
- \[2\]K\. J\. Åström and R\. Murray\(2021\)Feedback systems: an introduction for scientists and engineers\.Princeton university press\.Cited by:[§2](https://arxiv.org/html/2606.00269#S2.p4.1)\.
- \[3\]K\. Åström and R\. Murray\(2008\)Analysis and design of feedback systems princeton\.NJ: Princeton University Press \[Google Scholar\]\.Cited by:[§2](https://arxiv.org/html/2606.00269#S2.p4.1)\.
- \[4\]J\. Barreiros, A\. Beaulieu, A\. Bhat, R\. Cory, E\. Cousineau, H\. Dai, C\. Fang, K\. Hashimoto, M\. Z\. Irshad, M\. Itkina,et al\.\(2025\)A careful examination of large behavior models for multitask dexterous manipulation\.arXiv preprint arXiv:2507\.05331\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[5\]K\. Black, N\. Brown, D\. Driess, A\. Esmail, M\. Equi, C\. Finn, N\. Fusai, L\. Groom, K\. Hausman, B\. Ichter,et al\.\(2024\)π0\\pi\_\{0\}: A vision\-language\-action flow model for general robot control\.arXiv preprint arXiv:2410\.24164\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1),[§2](https://arxiv.org/html/2606.00269#S2.p1.1)\.
- \[6\]T\. Bricken, A\. Templeton, J\. Batson, B\. Chen, A\. Jermyn, T\. Conerly, N\. Turner, C\. Anil, C\. Denison, A\. Askell, R\. Lasenby, Y\. Wu, S\. Kravec, N\. Schiefer, T\. Maxwell, N\. Joseph, Z\. Hatfield\-Dodds, A\. Tamkin, K\. Nguyen, B\. McLean, J\. E\. Burke, T\. Hume, S\. Carter, T\. Henighan, and C\. Olah\(2023\)Towards monosemanticity: decomposing language models with dictionary learning\.Transformer Circuits Thread\.Note:https://transformer\-circuits\.pub/2023/monosemantic\-features/index\.htmlCited by:[§2](https://arxiv.org/html/2606.00269#S2.p2.1)\.
- \[7\]J\. Cao, Y\. Huang, H\. Guo, R\. Zhang, M\. Nan, W\. Mai, J\. Wang, H\. Cheng, J\. Sun, G\. Han,et al\.\(2025\)Compose your policies\! improving diffusion\-based or flow\-based robot policies via test\-time distribution\-level composition\.arXiv preprint arXiv:2510\.01068\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[8\]C\. Chi, Z\. Xu, S\. Feng, E\. Cousineau, Y\. Du, B\. Burchfiel, R\. Tedrake, and S\. Song\(2025\)Diffusion policy: visuomotor policy learning via action diffusion\.The International Journal of Robotics Research44\(10\-11\),pp\. 1684–1704\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[9\]H\. Cunningham, A\. Ewart, L\. Riggs, R\. Huben, and L\. Sharkey\(2023\)Sparse autoencoders find highly interpretable features in language models\.arXiv e\-prints,pp\. arXiv–2309\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1),[§2](https://arxiv.org/html/2606.00269#S2.p2.1),[§3\.1](https://arxiv.org/html/2606.00269#S3.SS1.SSS0.Px4.p1.3)\.
- \[10\]M\. Geva, A\. Caciularu, K\. Wang, and Y\. Goldberg\(2022\)Transformer feed\-forward layers build predictions by promoting concepts in the vocabulary space\.InProceedings of the 2022 conference on empirical methods in natural language processing,pp\. 30–45\.Cited by:[§2](https://arxiv.org/html/2606.00269#S2.p2.1)\.
- \[11\]M\. Geva, R\. Schuster, J\. Berant, and O\. Levy\(2021\)Transformer feed\-forward layers are key\-value memories\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,pp\. 5484–5495\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1),[§2](https://arxiv.org/html/2606.00269#S2.p2.1),[§3\.1](https://arxiv.org/html/2606.00269#S3.SS1.SSS0.Px1)\.
- \[12\]B\. Häon, K\. C\. Stocking, I\. Chuang, and C\. Tomlin\(2025\)Mechanistic interpretability for steering vision\-language\-action models\.InConference on Robot Learning,pp\. 2743–2762\.Cited by:[Figure 4](https://arxiv.org/html/2606.00269#A1.F4),[Figure 4](https://arxiv.org/html/2606.00269#A1.F4.3.2),[Table 10](https://arxiv.org/html/2606.00269#A3.T10),[Table 10](https://arxiv.org/html/2606.00269#A3.T10.5.2),[Table 11](https://arxiv.org/html/2606.00269#A3.T11),[Table 11](https://arxiv.org/html/2606.00269#A3.T11.5.2),[Table 9](https://arxiv.org/html/2606.00269#A3.T9),[Table 9](https://arxiv.org/html/2606.00269#A3.T9.5.2),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.102.102.102.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.114.114.114.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.18.18.18.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.30.30.30.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.42.42.42.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.54.54.54.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.6.6.6.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.66.66.66.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.78.78.78.4),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.90.90.90.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.102.102.102.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.114.114.114.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.18.18.18.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.30.30.30.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.42.42.42.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.54.54.54.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.6.6.6.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.66.66.66.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.78.78.78.4),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.90.90.90.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.102.102.102.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.114.114.114.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.18.18.18.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.30.30.30.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.42.42.42.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.54.54.54.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.6.6.6.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.66.66.66.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.78.78.78.4),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.90.90.90.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.102.102.102.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.114.114.114.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.18.18.18.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.30.30.30.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.42.42.42.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.54.54.54.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.6.6.6.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.66.66.66.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.78.78.78.4),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.90.90.90.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.12.12.12.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.20.20.20.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.28.28.28.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.36.36.36.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.4.4.4.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.44.44.44.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.52.52.52.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.60.60.60.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.68.68.68.3),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.76.76.76.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.12.12.12.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.20.20.20.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.28.28.28.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.36.36.36.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.4.4.4.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.44.44.44.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.52.52.52.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.60.60.60.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.68.68.68.3),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.76.76.76.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.12.12.12.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.20.20.20.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.28.28.28.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.36.36.36.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.4.4.4.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.44.44.44.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.52.52.52.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.60.60.60.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.68.68.68.3),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.76.76.76.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.12.12.12.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.20.20.20.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.28.28.28.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.36.36.36.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.4.4.4.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.44.44.44.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.52.52.52.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.60.60.60.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.68.68.68.3),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.76.76.76.3),[Appendix D](https://arxiv.org/html/2606.00269#A4.p1.2),[§1](https://arxiv.org/html/2606.00269#S1.p2.1),[§1](https://arxiv.org/html/2606.00269#S1.p3.1),[§2](https://arxiv.org/html/2606.00269#S2.p3.1),[§3\.1](https://arxiv.org/html/2606.00269#S3.SS1.SSS0.Px2),[§3\.1](https://arxiv.org/html/2606.00269#S3.SS1.SSS0.Px2.p2.1),[§3\.1](https://arxiv.org/html/2606.00269#S3.SS1.p1.1),[§3\.2](https://arxiv.org/html/2606.00269#S3.SS2.p1.1),[§4\.1](https://arxiv.org/html/2606.00269#S4.SS1.p3.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.12.10.10.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.15.13.13.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.27.25.25.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.30.28.28.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.33.31.31.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.45.43.43.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.48.46.46.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.51.49.49.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.63.61.61.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.66.64.64.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.69.67.67.4),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.9.7.7.4),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.10.8.8.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.18.16.16.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.20.18.18.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.22.20.20.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.30.28.28.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.32.30.30.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.34.32.32.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.42.40.40.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.44.42.42.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.46.44.44.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.6.4.4.3),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.8.6.6.3),[§4](https://arxiv.org/html/2606.00269#S4.p3.1)\.
- \[13\]Y\. Jiang, A\. Gupta, Z\. Zhang, G\. Wang, Y\. Dou, Y\. Chen, L\. Fei\-Fei, A\. Anandkumar, Y\. Zhu, and L\. Fan\(2023\)Vima: robot manipulation with multimodal prompts\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[14\]S\. Karamcheti, S\. Nair, A\. Balakrishna, P\. Liang, T\. Kollar, and D\. Sadigh\(2024\)Prismatic vlms: investigating the design space of visually\-conditioned language models\.InForty\-first International Conference on Machine Learning,Cited by:[§3\.2\.3](https://arxiv.org/html/2606.00269#S3.SS2.SSS3.p1.8)\.
- \[15\]K\. Kawaharazuka, J\. Oh, J\. Yamada, I\. Posner, and Y\. Zhu\(2025\)Vision\-language\-action models for robotics: a review towards real\-world applications\.IEEE Access\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[16\]M\. A\. Khan, N\. Boskov, F\. M\. Anwar, and M\. A\. KhanControlling vision–language–action policies through sparse latent directions\.InMechanistic Interpretability Workshop at NeurIPS 2025,Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1)\.
- \[17\]A\. Khazatsky, K\. Pertsch, S\. Nair, A\. Balakrishna, S\. Dasari, S\. Karamcheti, S\. Nasiriany, M\. K\. Srirama, L\. Y\. Chen, K\. Ellis,et al\.\(2024\)Droid: a large\-scale in\-the\-wild robot manipulation dataset\.arXiv preprint arXiv:2403\.12945\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[18\]M\. J\. Kim, K\. Pertsch, S\. Karamcheti, T\. Xiao, A\. Balakrishna, S\. Nair, R\. Rafailov, E\. Foster, G\. Lam, P\. Sanketi,et al\.\(2024\)Openvla: an open\-source vision\-language\-action model\.arXiv preprint arXiv:2406\.09246\.Cited by:[Table 13](https://arxiv.org/html/2606.00269#A4.T13.111.111.111.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.15.15.15.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.27.27.27.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.3.3.3.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.39.39.39.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.51.51.51.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.63.63.63.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.75.75.75.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.87.87.87.5),[Table 13](https://arxiv.org/html/2606.00269#A4.T13.99.99.99.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.111.111.111.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.15.15.15.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.27.27.27.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.3.3.3.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.39.39.39.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.51.51.51.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.63.63.63.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.75.75.75.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.87.87.87.5),[Table 14](https://arxiv.org/html/2606.00269#A4.T14.99.99.99.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.111.111.111.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.15.15.15.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.27.27.27.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.3.3.3.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.39.39.39.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.51.51.51.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.63.63.63.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.75.75.75.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.87.87.87.5),[Table 15](https://arxiv.org/html/2606.00269#A4.T15.99.99.99.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.111.111.111.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.15.15.15.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.27.27.27.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.3.3.3.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.39.39.39.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.51.51.51.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.63.63.63.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.75.75.75.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.87.87.87.5),[Table 16](https://arxiv.org/html/2606.00269#A4.T16.99.99.99.5),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.10.10.10.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.18.18.18.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.2.2.2.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.26.26.26.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.34.34.34.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.42.42.42.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.50.50.50.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.58.58.58.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.66.66.66.4),[Table 17](https://arxiv.org/html/2606.00269#A4.T17.74.74.74.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.10.10.10.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.18.18.18.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.2.2.2.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.26.26.26.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.34.34.34.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.42.42.42.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.50.50.50.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.58.58.58.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.66.66.66.4),[Table 18](https://arxiv.org/html/2606.00269#A4.T18.74.74.74.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.10.10.10.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.18.18.18.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.2.2.2.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.26.26.26.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.34.34.34.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.42.42.42.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.50.50.50.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.58.58.58.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.66.66.66.4),[Table 19](https://arxiv.org/html/2606.00269#A4.T19.74.74.74.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.10.10.10.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.18.18.18.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.2.2.2.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.26.26.26.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.34.34.34.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.42.42.42.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.50.50.50.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.58.58.58.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.66.66.66.4),[Table 20](https://arxiv.org/html/2606.00269#A4.T20.74.74.74.4),[Appendix D](https://arxiv.org/html/2606.00269#A4.p1.2),[§1](https://arxiv.org/html/2606.00269#S1.p1.1),[§2](https://arxiv.org/html/2606.00269#S2.p1.1),[§3\.2\.3](https://arxiv.org/html/2606.00269#S3.SS2.SSS3.p1.8),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.24.22.22.5),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.42.40.40.5),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.6.4.4.5),[Table 3](https://arxiv.org/html/2606.00269#S4.T3.60.58.58.5),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.16.14.14.4),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.28.26.26.4),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.4.2.2.4),[Table 4](https://arxiv.org/html/2606.00269#S4.T4.40.38.38.4),[§4](https://arxiv.org/html/2606.00269#S4.p2.1)\.
- \[19\]O\. Kroemer, S\. Niekum, and G\. Konidaris\(2021\)A review of robot learning for manipulation: challenges, representations, and algorithms\.Journal of Machine Learning Research22\(30\),pp\. 1–82\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p4.1)\.
- \[20\]X\. Li, K\. Hsu, J\. Gu, K\. Pertsch, O\. Mees, H\. R\. Walke, C\. Fu, I\. Lunawat, I\. Sieh, S\. Kirmani,et al\.\(2024\)Evaluating real\-world robot manipulation policies in simulation\.arXiv preprint arXiv:2405\.05941\.Cited by:[§4](https://arxiv.org/html/2606.00269#S4.p1.1)\.
- \[21\]B\. Liu, Y\. Zhu, C\. Gao, Y\. Feng, Q\. Liu, Y\. Zhu, and P\. Stone\(2023\)Libero: benchmarking knowledge transfer for lifelong robot learning\.Advances in Neural Information Processing Systems36,pp\. 44776–44791\.Cited by:[§4](https://arxiv.org/html/2606.00269#S4.p1.1)\.
- \[22\]K\. Meng, D\. Bau, A\. Andonian, and Y\. Belinkov\(2022\)Locating and editing factual associations in gpt\.Advances in neural information processing systems35,pp\. 17359–17372\.Cited by:[§2](https://arxiv.org/html/2606.00269#S2.p3.1)\.
- \[23\]C\. Olah, A\. Mordvintsev, and L\. Schubert\(2017\)Feature visualization\.Distill2\(11\),pp\. e7\.Cited by:[§2](https://arxiv.org/html/2606.00269#S2.p2.1)\.
- \[24\]Y\. Park\(2025\)Towards scalable robot learning without physical robots\.Master’s Thesis,Massachusetts Institute of Technology\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1)\.
- \[25\]S\. Reed, K\. Zolna, E\. Parisotto, S\. G\. Colmenarejo, A\. Novikov, G\. Barth\-Maron, M\. Gimenez, Y\. Sulsky, J\. Kay, J\. T\. Springenberg,et al\.\(2022\)A generalist agent\.arXiv preprint arXiv:2205\.06175\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[26\]J\. Schulman, F\. Wolski, P\. Dhariwal, A\. Radford, and O\. Klimov\(2017\)Proximal policy optimization algorithms\.arXiv preprint arXiv:1707\.06347\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p4.1)\.
- \[27\]G\. Team, A\. Ye, B\. Wang, C\. Ni, G\. Huang, G\. Zhao, H\. Li, J\. Li, J\. Zhu, L\. Feng,et al\.\(2025\)Gigabrain\-0: a world model\-powered vision\-language\-action model\.arXiv preprint arXiv:2510\.19430\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1)\.
- \[28\]H\. Touvron, L\. Martin, K\. Stone, P\. Albert, A\. Almahairi, Y\. Babaei, N\. Bashlykov, S\. Batra, P\. Bhargava, S\. Bhosale,et al\.\(2023\)Llama 2: open foundation and fine\-tuned chat models\.arXiv preprint arXiv:2307\.09288\.Cited by:[§2](https://arxiv.org/html/2606.00269#S2.p1.1),[§3\.2\.3](https://arxiv.org/html/2606.00269#S3.SS2.SSS3.p1.8)\.
- \[29\]A\. Wagenmaker, M\. Nakamoto, Y\. Zhang, S\. Park, W\. Yagoub, A\. Nagabandi, A\. Gupta, and S\. Levine\(2025\)Steering your diffusion policy with latent space reinforcement learning\.arXiv preprint arXiv:2506\.15799\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1)\.
- \[30\]H\. R\. Walke, K\. Black, T\. Z\. Zhao, Q\. Vuong, C\. Zheng, P\. Hansen\-Estruch, A\. W\. He, V\. Myers, M\. J\. Kim, M\. Du,et al\.\(2023\)Bridgedata v2: a dataset for robot learning at scale\.InConference on Robot Learning,pp\. 1723–1736\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1),[§4](https://arxiv.org/html/2606.00269#S4.p1.1)\.
- \[31\]X\. Yuan, T\. Mu, S\. Tao, Y\. Fang, M\. Zhang, and H\. Su\(2024\)Policy decorator: model\-agnostic online refinement for large policy model\.arXiv preprint arXiv:2412\.13630\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1)\.
- \[32\]J\. Zhang, Y\. Guo, Y\. Hu, X\. Chen, X\. Zhu, and J\. Chen\(2025\)UP\-vla: a unified understanding and prediction model for embodied agent\.InInternational Conference on Machine Learning \(ICML\),Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1)\.
- \[33\]J\. Zheng, J\. Li, Z\. Wang, D\. Liu, X\. Kang, Y\. Feng, Y\. Zheng, J\. Zou, Y\. Chen, J\. Zeng,et al\.\(2025\)X\-vla: soft\-prompted transformer as scalable cross\-embodiment vision\-language\-action model\.arXiv preprint arXiv:2510\.10274\.Cited by:[§4\.1](https://arxiv.org/html/2606.00269#S4.SS1.p6.1),[Table 1](https://arxiv.org/html/2606.00269#S4.T1),[Table 1](https://arxiv.org/html/2606.00269#S4.T1.19.2),[§4](https://arxiv.org/html/2606.00269#S4.p2.1)\.
- \[34\]Z\. Zhu, Z\. Li, X\. Yuan, H\. Zhang, Y\. Liu, C\. Zhang, Y\. Yu, W\. Zhang, and M\. Liu\(2025\)Unified latent steering and residual refinement for online improvement of diffusion policy models\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p2.1)\.
- \[35\]B\. Zitkovich, T\. Yu, S\. Xu, P\. Xu, T\. Xiao, F\. Xia, J\. Wu, P\. Wohlhart, S\. Welker, A\. Wahid,et al\.\(2023\)Rt\-2: vision\-language\-action models transfer web knowledge to robotic control\.InConference on Robot Learning,pp\. 2165–2183\.Cited by:[§1](https://arxiv.org/html/2606.00269#S1.p1.1),[§2](https://arxiv.org/html/2606.00269#S2.p1.1)\.
- \[36\]A\. Zou, L\. Phan, S\. Chen, J\. Campbell, P\. Guo, R\. Ren, A\. Pan, X\. Yin, M\. Mazeika, A\. Dombrowski,et al\.\(2023\)Representation engineering: a top\-down approach to ai transparency\.arXiv preprint arXiv:2310\.01405\.Cited by:[§2](https://arxiv.org/html/2606.00269#S2.p3.1)\.

## Appendix ARL Training without PID Initialization

Table 5:Performance of RL controllers trained without PID initialization for speed steering across all four LIBERO task suites\. Compared to the RL\+PID controllers used in the main paper, training RL without PID initialization results in lower task success rates \(SR\), demonstrating the importance of PID initialization for starting with a stable RL training\.TASK SUITEMETHODSpeed \(cm/s\)SAT \(cm/s\)SR \(%\)LIBERO LONGRL \+ PID11\.037±\\pm2\.8432\.535±\\pm1\.21066\.50RL \(no PID\)11\.267±\\pm2\.9022\.669±\\pm1\.29257\.00LIBERO GOALRL \+ PID14\.040±\\pm4\.3963\.488±\\pm1\.38883\.00RL \(no PID\)14\.127±\\pm4\.3723\.657±\\pm1\.30678\.00LIBERO OBJECTRL \+ PID14\.111±\\pm3\.1492\.171±\\pm0\.46676\.50RL \(no PID\)14\.080±\\pm2\.9272\.124±\\pm0\.43472\.00LIBERO SPATIALRL \+ PID17\.642±\\pm2\.8453\.606±\\pm0\.82078\.50RL \(no PID\)17\.600±\\pm2\.9103\.606±\\pm0\.75977\.00

In this section, we provide detailed results across all LIBERO task suites to support our claim in the main paper on “degraded task success and steering performance for RL training initialized without PID data”\.

In our default training setup, the RL controller is initialized using steering coefficients generated by the PID controller, which provides a stable starting point for learning\. To evaluate the importance of this initialization, we also trained RL policies from random initialization, without using PID\-generated trajectories\.

![Refer to caption](https://arxiv.org/html/2606.00269v1/images/tradeoff-tasks.png)Figure 4:Qualitative examples on steering\-success tradeoff that lead to failed tasks in static steering \(left\)\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\], while achieving success with CTRL\-STEER \(ours\) \(right\)\. In \(a\) and \(b\), the model was steered for speed, i\.e\., to move faster\. Static steering led to abrupt motion and knocked over the nearby objects\. In \(c\) and \(d\), the model was steered to go higher\. Static steering led to abrupt motion in the upward direction thereby dropping the object\. CTRL\-STEER adjusts the steering strength and completes the task successfully on these tasks\.[Tab\.5](https://arxiv.org/html/2606.00269#A1.T5)reports the detailed results for speed steering on all four LIBERO task suites: Long, Goal, Object, and Spatial\. Across all suites, removing PID initialization consistently leads to lower task success rates while providing no improvement in steering metrics\. These results support the conclusion in the main paper that PID initialization stabilizes RL training and improves the trade\-off between concept steering and task completion\.

## Appendix BSteering Success Tradeoff

In this section, we illustrate qualitative examples of the steering–success trade\-off\. In[Fig\.4](https://arxiv.org/html/2606.00269#A1.F4), Static steering with large intervention strengths \(left\) often produces abrupt motions that destabilize the task execution\. In contrast, CTRL\-STEER \(right\) dynamically adjusts the steering coefficient during execution, allowing the robot to maintain the desired concept behavior while preserving task success\.[Fig\.5](https://arxiv.org/html/2606.00269#A2.F5)shows the effect of steering on the end\-effector height where steering by height led to poor task completion\.

![Refer to caption](https://arxiv.org/html/2606.00269v1/images/heightplot2.png)Figure 5:Effect of steering on the task‘put both the alphabet soup and the cream cheese box in the basket’\. The unsteered model fails in some episodes\. Static steering often prevents the robot from successfully grasping the first object, resulting in lower task success and reduced end\-effector height\. In contrast, closed\-loop steering by CTRL\-STEER enables successful task completion while maintaining a higher end\-effector trajectory whenever possible\.
## Appendix CTime Complexity Analysis

[Tab\.6](https://arxiv.org/html/2606.00269#A3.T6)shows the average time complexity and peak GPU usage for different approaches\. CTRL\-STEER imposes low overhead on both time complexity and GPU usage, making it a suitable approach over retraining or finetuning the model\.[Tab\.7](https://arxiv.org/html/2606.00269#A3.T7)to[Tab\.12](https://arxiv.org/html/2606.00269#A3.T12)shows the detailed computational costs for each approach\.

[Fig\.6](https://arxiv.org/html/2606.00269#A3.F6)and[Fig\.7](https://arxiv.org/html/2606.00269#A3.F7)shows the breakdown of the time overhead in controlled steering\. The results show that the major computational cost comes from VLA forward pass, and the overhead created by the controller is negligible\.[Fig\.8](https://arxiv.org/html/2606.00269#A3.F8)shows how the improvement in task success rate dominates over the increase in computational cost\.

Table 6:Average Inference Cost for All Approaches \(No steering, Static steering withC=5,10,20C=5,10,20, and CTRL\-STEER with PID and RL\)Tasktime / step \(s\)Peak GPU \(GB\)OpenVLA0\.1869±\\pm0\.003914\.2587±\\pm0\.0003C=50\.1961±\\pm0\.012414\.2587±\\pm0\.0003C=100\.1967±\\pm0\.007914\.2587±\\pm0\.0003C=200\.1974±\\pm0\.006014\.2587±\\pm0\.0003CTRL\-STEER \(PID\)0\.2021±\\pm0\.007614\.2587±\\pm0\.0003CTRL\-STEER \(RL\)0\.2094±\\pm0\.007714\.2588±\\pm0\.0003![Refer to caption](https://arxiv.org/html/2606.00269v1/images/task_computation.png)Figure 6:Breakdown of the per\-timestep computational cost for CTRL\-STEER with only PID \(left\) and RL\+PID \(right\)\. The VLA forward pass dominates the runtime in both cases, while the additional computations required for controlled steering contribute a very low overhead\. The y\-axis is shown on a log scale; the breakdown is nearlyidentical across tasks, indicating consistent per\-timestep overhead\.![Refer to caption](https://arxiv.org/html/2606.00269v1/images/avg_computation.png)Figure 7:Breakdown of theaverageper\-timestep computational cost for CTRL\-STEER with only PID \(left\) and RL\+PID \(right\)\. The VLA forward pass dominates the runtime in both cases, while the additional computations required for controlled steering contribute a very low overhead\.![Refer to caption](https://arxiv.org/html/2606.00269v1/images/success_cost_tradeoff.png)Figure 8:Trade\-off between task success and computational cost across steering methods\. Bars indicate task success rate, while the red line shows the per\-timestep inference time \(left\) and peak GPU memory usage \(right\)\. CTRL\-STEER \(PID and RL\+PID\) achieves the highest success rates with only a modest increase in computational overhead compared to static steering baselines\.Table 7:Inference cost per timestep for CTRL\-STEER \(RL\+PID\)Tasktime / step \(s\)Peak GPU \(GB\)open the middle drawer of the cabinet0\.220514\.2587put the bowl on the stove0\.221714\.2587put the wine bottle on top of the cabinet0\.216614\.2590open the top drawer and put the bowl inside0\.208314\.2592put the bowl on top of the cabinet0\.209814\.2589push the plate to the front of the stove0\.207014\.2590put the cream cheese in the bowl0\.207314\.2590turn on the stove0\.204314\.2582put the bowl on the plate0\.195614\.2586put the wine bottle on the rack0\.203114\.2589Average0\.2094±\\pm0\.007714\.2588±\\pm0\.0003Table 8:Inference cost per timestep for CTRL\-STEER with only PID SteeringTasktime / step \(s\)Peak GPU \(GB\)open the middle drawer of the cabinet0\.215814\.2586put the bowl on the stove0\.215514\.2586put the wine bottle on top of the cabinet0\.204714\.2589open the top drawer and put the bowl inside0\.203614\.2591put the bowl on top of the cabinet0\.196714\.2587push the plate to the front of the stove0\.196114\.2589put the cream cheese in the bowl0\.201514\.2589turn on the stove0\.197714\.2581put the bowl on the plate0\.195614\.2584put the wine bottle on the rack0\.193314\.2587Average0\.2021±\\pm0\.007614\.2587±\\pm0\.0003Table 9:Inference cost per timestep for static steering with C=5\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]Tasktime / step \(s\)Peak GPU \(GB\)open the middle drawer of the cabinet0\.224314\.2586put the bowl on the stove0\.213514\.2586put the wine bottle on top of the cabinet0\.192814\.2589open the top drawer and put the bowl inside0\.190314\.2591put the bowl on top of the cabinet0\.194714\.2587push the plate to the front of the stove0\.187614\.2589put the cream cheese in the bowl0\.199414\.2589turn on the stove0\.187114\.2581put the bowl on the plate0\.188214\.2584put the wine bottle on the rack0\.183314\.2587Average0\.1961±\\pm0\.012414\.2587±\\pm0\.0003Table 10:Inference cost per timestep for static steering with C=10\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]Tasktime / step \(s\)Peak GPU \(GB\)open the middle drawer of the cabinet0\.214714\.2586put the bowl on the stove0\.208114\.2586put the wine bottle on top of the cabinet0\.196414\.2589open the top drawer and put the bowl inside0\.193614\.2591put the bowl on top of the cabinet0\.192714\.2587push the plate to the front of the stove0\.189114\.2589put the cream cheese in the bowl0\.197614\.2589turn on the stove0\.189814\.2581put the bowl on the plate0\.194314\.2584put the wine bottle on the rack0\.190914\.2587Average0\.1967±\\pm0\.007914\.2587±\\pm0\.0003Table 11:Inference cost per timestep for static steering with C=20\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]Tasktime / step \(s\)Peak GPU \(GB\)open the middle drawer of the cabinet0\.212514\.2586put the bowl on the stove0\.203314\.2586put the wine bottle on top of the cabinet0\.199514\.2589open the top drawer and put the bowl inside0\.193914\.2591put the bowl on top of the cabinet0\.193514\.2587push the plate to the front of the stove0\.193514\.2589put the cream cheese in the bowl0\.193714\.2589turn on the stove0\.192114\.2581put the bowl on the plate0\.197314\.2584put the wine bottle on the rack0\.195014\.2587Average0\.1974±\\pm0\.006014\.2587±\\pm0\.0003Table 12:Inference cost per timestep for unsteered OpenVLA modelTasktime / step \(s\)Peak GPU \(GB\)open the middle drawer of the cabinet0\.195414\.2586put the bowl on the stove0\.190014\.2586put the wine bottle on top of the cabinet0\.183514\.2589open the top drawer and put the bowl inside0\.186914\.2591put the bowl on top of the cabinet0\.184214\.2587push the plate to the front of the stove0\.184414\.2589put the cream cheese in the bowl0\.191514\.2589turn on the stove0\.186514\.2581put the bowl on the plate0\.184014\.2584put the wine bottle on the rack0\.182514\.2587Average0\.1869±\\pm0\.003914\.2587±\\pm0\.0003
## Appendix DTask\-wise results

[Tab\.13](https://arxiv.org/html/2606.00269#A4.T13)to[Tab\.20](https://arxiv.org/html/2606.00269#A4.T20)show the task\-wise results for our experiments on each task suite in LIBERO\. We compare our results with the unsteered OpenVLA model\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\], static steered model\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\], and our CTRL\-STEER steering tool with both only PID controller and RL\+PID controller \(RL training initialized with PID data\)\. For Static steering, for each task we chose theα\\alphavalue that gave the best result\. For each task,2020individual rollouts were run deterministically by setting seed values\.

Table 13:Detailed task\-wise results on Height intervention for LIBERO LONG tasksuiteTaskMethodHeight \(m\)95th percentileAATSR \(%\)put both the alphabetsoup and the tomatosauce in the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.60±\\pm0\.020\.70±\\pm0\.01127\.22±\\pm30\.1865\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.62±\\pm0\.010\.71±\\pm0\.02137\.82±\\pm36\.9265\.00Ours \(PID\)0\.62±\\pm0\.010\.70±\\pm0\.02125\.25±\\pm30\.2255\.00Ours \(RL\)0\.61±\\pm0\.010\.71±\\pm0\.01130\.40±\\pm28\.3765\.00put the cheese boxand the butter in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.60±\\pm0\.010\.70±\\pm0\.0194\.51±\\pm9\.9875\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.60±\\pm0\.030\.70±\\pm0\.01108\.97±\\pm41\.4780\.00Ours \(PID\)0\.60±\\pm0\.020\.70±\\pm0\.0188\.59±\\pm9\.8185\.00Ours \(RL\)0\.59±\\pm0\.020\.70±\\pm0\.0292\.74±\\pm18\.0165\.00Turn on the stove andput the mokapot on itOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.04±\\pm0\.021\.17±\\pm0\.01233\.30±\\pm32\.1770\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.04±\\pm0\.011\.17±\\pm0\.01227\.19±\\pm28\.9950\.00Ours \(PID\)1\.04±\\pm0\.021\.17±\\pm0\.00268\.83±\\pm153\.2060\.00Ours \(RL\)1\.02±\\pm0\.031\.16±\\pm0\.02297\.11±\\pm127\.6640\.00put the black bowl inthe bottom drawer ofthe cabinet and close itOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.03±\\pm0\.031\.16±\\pm0\.02233\.57±\\pm82\.2950\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.04±\\pm0\.041\.17±\\pm0\.01264\.11±\\pm149\.1750\.00Ours \(PID\)1\.01±\\pm0\.021\.16±\\pm0\.01292\.53±\\pm131\.5645\.00Ours \(RL\)1\.02±\\pm0\.031\.16±\\pm0\.02268\.07±\\pm96\.2645\.00put the white mugon the left plate andput the yellow and whitemug on the right plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.59±\\pm0\.010\.67±\\pm0\.01106\.93±\\pm13\.3940\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.59±\\pm0\.020\.66±\\pm0\.02117\.07±\\pm38\.9955\.00Ours \(PID\)0\.59±\\pm0\.020\.66±\\pm0\.02114\.30±\\pm36\.0535\.00Ours \(RL\)0\.59±\\pm0\.020\.67±\\pm0\.01124\.15±\\pm57\.6565\.00pick up the book andplace it in the backcompartment of the caddyOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.16±\\pm0\.011\.28±\\pm0\.03172\.27±\\pm43\.0075\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.16±\\pm0\.011\.27±\\pm0\.02185\.08±\\pm44\.5980\.00Ours \(PID\)1\.16±\\pm0\.011\.28±\\pm0\.03179\.58±\\pm38\.6380\.00Ours \(RL\)1\.16±\\pm0\.021\.28±\\pm0\.03177\.97±\\pm27\.1580\.00put the white mugon the plate andput the chocolate puddingto the right of the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.55±\\pm0\.030\.67±\\pm0\.0388\.62±\\pm20\.1650\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.55±\\pm0\.010\.67±\\pm0\.0290\.82±\\pm34\.1355\.00Ours \(PID\)0\.55±\\pm0\.030\.67±\\pm0\.0294\.31±\\pm42\.5250\.00Ours \(RL\)0\.55±\\pm0\.020\.67±\\pm0\.0185\.34±\\pm23\.6475\.00put both the alphabet soupand the cream cheesebox in the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.60±\\pm0\.010\.69±\\pm0\.01113\.02±\\pm32\.9065\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.59±\\pm0\.020\.69±\\pm0\.01100\.24±\\pm15\.6655\.00Ours \(PID\)0\.59±\\pm0\.020\.69±\\pm0\.01104\.71±\\pm26\.7955\.00Ours \(RL\)0\.59±\\pm0\.030\.69±\\pm0\.01108\.56±\\pm29\.6750\.00put both moka potson the stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.10±\\pm0\.011\.19±\\pm0\.01406\.09±\\pm79\.0740\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.09±\\pm0\.011\.18±\\pm0\.01433\.88±\\pm86\.8745\.00Ours \(PID\)1\.09±\\pm0\.021\.18±\\pm0\.02417\.96±\\pm81\.5025\.00Ours \(RL\)1\.09±\\pm0\.011\.18±\\pm0\.01451\.37±\\pm155\.2835\.00put the yellow andwhite mug in themicrowave and close itOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.06±\\pm0\.031\.17±\\pm0\.01335\.49±\\pm92\.7050\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.07±\\pm0\.021\.18±\\pm0\.01306\.57±\\pm92\.7650\.00Ours \(PID\)1\.07±\\pm0\.021\.18±\\pm0\.01332\.52±\\pm156\.1950\.00Ours \(RL\)1\.07±\\pm0\.021\.18±\\pm0\.02305\.47±\\pm52\.7750\.00

Table 14:Detailed task\-wise results on Height intervention for LIBERO GOAL tasksuiteTaskMethodHeight \(m\)95th percentileAATSR \(%\)Open the middle layerof the drawerOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.08±\\pm0\.031\.16±\\pm0\.02183\.97±\\pm137\.4660\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.08±\\pm0\.021\.16±\\pm0\.01175\.86±\\pm119\.3870\.00Ours \(PID\)1\.08±\\pm0\.021\.16±\\pm0\.02201\.36±\\pm163\.3370\.00Ours \(RL\)1\.09±\\pm0\.021\.17±\\pm0\.01137\.33±\\pm58\.0855\.00Put the bowl onthe stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.04±\\pm0\.031\.17±\\pm0\.0188\.07±\\pm82\.83100\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.04±\\pm0\.021\.17±\\pm0\.0170\.46±\\pm13\.55100\.00Ours \(PID\)1\.04±\\pm0\.021\.17±\\pm0\.0172\.83±\\pm15\.0995\.00Ours \(RL\)1\.04±\\pm0\.031\.17±\\pm0\.0182\.37±\\pm57\.1395\.00Put the wine bottleon the top of the drawerOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.15±\\pm0\.021\.30±\\pm0\.0392\.33±\\pm44\.6380\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.14±\\pm0\.021\.30±\\pm0\.0287\.87±\\pm18\.7875\.00Ours \(PID\)1\.14±\\pm0\.021\.29±\\pm0\.04140\.67±\\pm169\.8570\.00Ours \(RL\)1\.14±\\pm0\.031\.30±\\pm0\.03107\.73±\\pm68\.1985\.00Open the top layerof the drawer and putthe bowl insideOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.09±\\pm0\.021\.21±\\pm0\.03162\.55±\\pm23\.1450\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.10±\\pm0\.021\.22±\\pm0\.01153\.40±\\pm14\.6160\.00Ours \(PID\)1\.10±\\pm0\.021\.22±\\pm0\.03210\.72±\\pm130\.4840\.00Ours \(RL\)1\.10±\\pm0\.011\.22±\\pm0\.02147\.53±\\pm14\.9065\.00Put the bowl onthe top of the drawerOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.08±\\pm0\.021\.23±\\pm0\.0274\.13±\\pm13\.3375\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.08±\\pm0\.021\.22±\\pm0\.0274\.34±\\pm14\.2395\.00Ours \(PID\)1\.08±\\pm0\.021\.22±\\pm0\.0276\.09±\\pm18\.18100\.00Ours \(RL\)1\.08±\\pm0\.021\.22±\\pm0\.0282\.13±\\pm40\.0595\.00Push the plate tothe front of the stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.98±\\pm0\.031\.16±\\pm0\.02109\.82±\\pm36\.8280\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.97±\\pm0\.011\.16±\\pm0\.02111\.36±\\pm39\.1870\.00Ours \(PID\)0\.98±\\pm0\.021\.17±\\pm0\.02105\.14±\\pm16\.0085\.00Ours \(RL\)0\.98±\\pm0\.031\.16±\\pm0\.02107\.90±\\pm27\.9790\.00Put the cream cheeseon the bowlOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.04±\\pm0\.021\.17±\\pm0\.0297\.46±\\pm42\.3375\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.04±\\pm0\.031\.17±\\pm0\.01107\.59±\\pm121\.0790\.00Ours \(PID\)1\.05±\\pm0\.031\.17±\\pm0\.0194\.57±\\pm39\.7185\.00Ours \(RL\)1\.04±\\pm0\.031\.17±\\pm0\.0182\.33±\\pm25\.6670\.00Turn on the stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.03±\\pm0\.031\.16±\\pm0\.0589\.25±\\pm100\.8995\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.04±\\pm0\.031\.18±\\pm0\.0182\.72±\\pm82\.8095\.00Ours \(PID\)1\.03±\\pm0\.031\.18±\\pm0\.0176\.68±\\pm40\.9590\.00Ours \(RL\)1\.04±\\pm0\.031\.18±\\pm0\.0182\.20±\\pm79\.16100\.00Put the bowl onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.02±\\pm0\.011\.17±\\pm0\.0157\.11±\\pm4\.5695\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.02±\\pm0\.011\.16±\\pm0\.0275\.93±\\pm67\.10100\.00Ours \(PID\)1\.03±\\pm0\.011\.16±\\pm0\.0162\.36±\\pm10\.4295\.00Ours \(RL\)1\.03±\\pm0\.021\.17±\\pm0\.0166\.07±\\pm26\.6190\.00Put the wine bottleon the rackOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.09±\\pm0\.051\.23±\\pm0\.03193\.83±\\pm152\.3665\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.11±\\pm0\.021\.25±\\pm0\.02135\.47±\\pm62\.0865\.00Ours \(PID\)1\.11±\\pm0\.021\.24±\\pm0\.03154\.05±\\pm124\.5860\.00Ours \(RL\)1\.11±\\pm0\.021\.24±\\pm0\.02136\.68±\\pm95\.2275\.00

Table 15:Detailed task\-wise results on Height intervention for LIBERO OBJECT tasksuiteTaskMethodHeight \(m\)95th percentileAATSR \(%\)Pick the alphabetsoup and place itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.22±\\pm0\.010\.30±\\pm0\.022\.48±\\pm1\.0390\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.22±\\pm0\.020\.30±\\pm0\.022\.85±\\pm1\.7785\.00Ours \(PID\)0\.21±\\pm0\.010\.30±\\pm0\.022\.42±\\pm0\.9580\.00Ours \(RL\)0\.21±\\pm0\.020\.29±\\pm0\.012\.76±\\pm0\.8685\.00Pick up the creamcheese andplace itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.17±\\pm0\.010\.26±\\pm0\.010\.18±\\pm0\.1370\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.17±\\pm0\.040\.26±\\pm0\.010\.17±\\pm0\.2055\.00Ours \(PID\)0\.17±\\pm0\.010\.26±\\pm0\.010\.19±\\pm0\.1185\.00Ours \(RL\)0\.16±\\pm0\.020\.26±\\pm0\.010\.15±\\pm0\.0975\.00Pick up the saladdressing andplace itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.23±\\pm0\.020\.31±\\pm0\.026\.14±\\pm2\.2875\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.23±\\pm0\.010\.31±\\pm0\.016\.38±\\pm2\.0590\.00Ours \(PID\)0\.23±\\pm0\.010\.32±\\pm0\.016\.21±\\pm2\.3080\.00Ours \(RL\)0\.23±\\pm0\.010\.32±\\pm0\.015\.52±\\pm0\.9880\.00Pick up the bbqsauce and place it inthe basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.19±\\pm0\.040\.28±\\pm0\.022\.40±\\pm1\.4240\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.22±\\pm0\.010\.28±\\pm0\.015\.38±\\pm1\.5645\.00Ours \(PID\)0\.21±\\pm0\.030\.29±\\pm0\.0210\.01±\\pm17\.2450\.00Ours \(RL\)0\.21±\\pm0\.030\.29±\\pm0\.037\.04±\\pm8\.6350\.00Pick up the ketchup andplace it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.24±\\pm0\.010\.31±\\pm0\.028\.19±\\pm1\.8690\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.24±\\pm0\.010\.31±\\pm0\.0213\.69±\\pm12\.1180\.00Ours \(PID\)0\.24±\\pm0\.010\.32±\\pm0\.037\.60±\\pm1\.1285\.00Ours \(RL\)0\.24±\\pm0\.020\.31±\\pm0\.0213\.33±\\pm12\.7180\.00Pick up the tomatosauce and place itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.19±\\pm0\.020\.27±\\pm0\.013\.56±\\pm1\.5075\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.20±\\pm0\.010\.27±\\pm0\.013\.15±\\pm0\.9175\.00Ours \(PID\)0\.17±\\pm0\.040\.26±\\pm0\.013\.52±\\pm1\.0480\.00Ours \(RL\)0\.19±\\pm0\.020\.26±\\pm0\.013\.87±\\pm1\.4280\.00Pick up the butterand place it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.20±\\pm0\.010\.30±\\pm0\.020\.15±\\pm0\.0460\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.19±\\pm0\.010\.27±\\pm0\.020\.36±\\pm0\.2675\.00Ours \(PID\)0\.20±\\pm0\.010\.29±\\pm0\.020\.14±\\pm0\.0560\.00Ours \(RL\)0\.20±\\pm0\.010\.28±\\pm0\.020\.20±\\pm0\.1060\.00Pick up the milk andplace it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.24±\\pm0\.010\.32±\\pm0\.027\.20±\\pm5\.1170\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.24±\\pm0\.010\.32±\\pm0\.026\.48±\\pm2\.1190\.00Ours \(PID\)0\.24±\\pm0\.010\.32±\\pm0\.026\.35±\\pm4\.4290\.00Ours \(RL\)0\.24±\\pm0\.010\.32±\\pm0\.026\.59±\\pm3\.2985\.00Pick up the pudding andplace it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.18±\\pm0\.040\.28±\\pm0\.030\.27±\\pm0\.3860\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.19±\\pm0\.010\.28±\\pm0\.020\.22±\\pm0\.1185\.00Ours \(PID\)0\.19±\\pm0\.030\.28±\\pm0\.020\.20±\\pm0\.0665\.00Ours \(RL\)0\.19±\\pm0\.020\.28±\\pm0\.010\.33±\\pm0\.2285\.00Pick up the orangejuice and place it inthe basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]0\.23±\\pm0\.020\.29±\\pm0\.029\.68±\\pm9\.0890\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]0\.23±\\pm0\.010\.31±\\pm0\.025\.40±\\pm2\.5695\.00Ours \(PID\)0\.23±\\pm0\.010\.30±\\pm0\.028\.11±\\pm7\.4885\.00Ours \(RL\)0\.23±\\pm0\.010\.30±\\pm0\.016\.46±\\pm1\.9190\.00

Table 16:Detailed task\-wise results on Height intervention for LIBERO SPATIAL tasksuiteTaskMethodHeight \(m\)95th percentileAATSR \(%\)Pick the akita black bowlbetween the plate and theramekin and place it onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.04±\\pm0\.021\.18±\\pm0\.0171\.44±\\pm16\.4775\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.05±\\pm0\.031\.19±\\pm0\.0174\.07±\\pm33\.0290\.00Ours \(PID\)1\.04±\\pm0\.011\.18±\\pm0\.0177\.09±\\pm48\.7080\.00Ours \(RL\)1\.05±\\pm0\.031\.18±\\pm0\.0185\.69±\\pm67\.6490\.00Pick the akita black bowl nextto the ramekin and placeit on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.05±\\pm0\.011\.18±\\pm0\.0194\.74±\\pm12\.9895\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.06±\\pm0\.011\.17±\\pm0\.0188\.47±\\pm5\.23100\.00Ours \(PID\)1\.06±\\pm0\.011\.18±\\pm0\.0187\.40±\\pm5\.9290\.00Ours \(RL\)1\.05±\\pm0\.021\.17±\\pm0\.02119\.13±\\pm133\.02100\.00Pick the akita black bowl fromtable center and place iton the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.06±\\pm0\.031\.18±\\pm0\.01104\.24±\\pm46\.4995\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.06±\\pm0\.051\.17±\\pm0\.01156\.20±\\pm201\.4885\.00Ours \(PID\)1\.05±\\pm0\.021\.17±\\pm0\.0186\.48±\\pm12\.3175\.00Ours \(RL\)1\.05±\\pm0\.041\.18±\\pm0\.01112\.69±\\pm94\.8680\.00Pick the akita black bowl nextto the cookies box andplace it on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.05±\\pm0\.011\.18±\\pm0\.0170\.37±\\pm7\.7385\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.05±\\pm0\.011\.18±\\pm0\.0167\.69±\\pm4\.97100\.00Ours \(PID\)1\.05±\\pm0\.011\.18±\\pm0\.0168\.78±\\pm6\.3195\.00Ours \(RL\)1\.06±\\pm0\.011\.18±\\pm0\.0169\.95±\\pm6\.7895\.00Pick the akita black bowl inthe top layer of the woodencabinet and place it onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.15±\\pm0\.011\.26±\\pm0\.02109\.62±\\pm8\.5370\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.15±\\pm0\.011\.26±\\pm0\.01107\.59±\\pm7\.2175\.00Ours \(PID\)1\.14±\\pm0\.031\.25±\\pm0\.02137\.42±\\pm89\.7165\.00Ours \(RL\)1\.16±\\pm0\.011\.26±\\pm0\.01110\.86±\\pm10\.5975\.00Pick the akita black bowl onthe ramekin and place it onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.05±\\pm0\.021\.17±\\pm0\.0181\.43±\\pm22\.4740\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.07±\\pm0\.011\.17±\\pm0\.0181\.18±\\pm10\.5065\.00Ours \(PID\)1\.06±\\pm0\.011\.17±\\pm0\.0174\.63±\\pm9\.6645\.00Ours \(RL\)1\.06±\\pm0\.011\.17±\\pm0\.0178\.89±\\pm14\.0850\.00Pick the akita black bowl onthe cookies box and placeit on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.04±\\pm0\.011\.17±\\pm0\.0185\.38±\\pm9\.9095\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.05±\\pm0\.021\.17±\\pm0\.0196\.71±\\pm32\.5295\.00Ours \(PID\)1\.05±\\pm0\.011\.17±\\pm0\.0182\.47±\\pm7\.3585\.00Ours \(RL\)1\.04±\\pm0\.011\.17±\\pm0\.0187\.63±\\pm13\.6680\.00Pick the akita black bowlon the stove and place iton the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.04±\\pm0\.021\.16±\\pm0\.01106\.71±\\pm37\.0970\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.05±\\pm0\.011\.16±\\pm0\.0197\.01±\\pm6\.6875\.00Ours \(PID\)1\.04±\\pm0\.021\.16±\\pm0\.02100\.66±\\pm14\.5480\.00Ours \(RL\)1\.05±\\pm0\.011\.16±\\pm0\.0195\.56±\\pm6\.3780\.00Pick the akita black bowlnext to the plate andplace it on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.04±\\pm0\.011\.17±\\pm0\.0178\.04±\\pm7\.3990\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.05±\\pm0\.031\.17±\\pm0\.01182\.60±\\pm209\.8785\.00Ours \(PID\)1\.05±\\pm0\.031\.17±\\pm0\.01121\.24±\\pm137\.6080\.00Ours \(RL\)1\.04±\\pm0\.021\.17±\\pm0\.01109\.99±\\pm97\.7080\.00Pick the akita black bowl onthe wooden cabinet and placeit on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]1\.19±\\pm0\.021\.29±\\pm0\.02110\.09±\\pm20\.0765\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]1\.18±\\pm0\.021\.28±\\pm0\.02152\.80±\\pm126\.9975\.00Ours \(PID\)1\.19±\\pm0\.011\.28±\\pm0\.02101\.64±\\pm6\.0855\.00Ours \(RL\)1\.19±\\pm0\.011\.29±\\pm0\.01105\.96±\\pm7\.4065\.00

Table 17:Detailed task\-wise results on Speed intervention for LIBERO LONG tasksuiteTaskMethodSpeed \(cm/s\)SATSR \(%\)put both the alphabetsoup and the tomatosauce in the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]11\.79±\\pm2\.302\.80±\\pm0\.3365\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]11\.69±\\pm2\.772\.60±\\pm0\.4960\.00Ours \(PID\)11\.88±\\pm1\.952\.61±\\pm0\.4560\.00Ours \(RL\)11\.04±\\pm2\.142\.77±\\pm0\.5375\.00put the cheese boxand the butter in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]15\.17±\\pm1\.383\.26±\\pm0\.2675\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]13\.47±\\pm3\.073\.09±\\pm0\.2775\.00Ours \(PID\)14\.30±\\pm3\.353\.08±\\pm0\.2980\.00Ours \(RL\)14\.26±\\pm2\.662\.99±\\pm0\.4375\.00turn on the stove andput the mokapot on itOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]9\.56±\\pm1\.012\.99±\\pm0\.5870\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]10\.30±\\pm0\.582\.93±\\pm0\.8455\.00Ours \(PID\)10\.37±\\pm1\.103\.27±\\pm0\.5775\.00Ours \(RL\)10\.01±\\pm1\.172\.90±\\pm0\.4160\.00put the black bowl inthe bottom drawer ofthe cabinet and close itOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]10\.11±\\pm2\.151\.77±\\pm0\.9750\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]10\.81±\\pm1\.651\.46±\\pm0\.5965\.00Ours \(PID\)10\.37±\\pm2\.421\.70±\\pm0\.9060\.00Ours \(RL\)9\.14±\\pm3\.591\.29±\\pm0\.8570\.00put the white mugon the left plate andput the yellow and whitemug on the right plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]11\.91±\\pm1\.310\.99±\\pm0\.2840\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]10\.27±\\pm3\.151\.09±\\pm0\.2750\.00Ours \(PID\)10\.80±\\pm2\.651\.04±\\pm0\.2145\.00Ours \(RL\)11\.62±\\pm2\.941\.12±\\pm0\.2445\.00pick up the book andplace it in the backcompartment of the caddyOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]11\.15±\\pm1\.464\.81±\\pm0\.9875\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]10\.81±\\pm2\.115\.35±\\pm0\.7770\.00Ours \(PID\)10\.15±\\pm2\.684\.88±\\pm1\.0960\.00Ours \(RL\)10\.76±\\pm1\.644\.53±\\pm0\.8980\.00put the white mugon the plate andput the chocolate puddingto the right of the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]10\.21±\\pm2\.622\.07±\\pm0\.3850\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]9\.99±\\pm3\.312\.06±\\pm0\.6460\.00Ours \(PID\)9\.53±\\pm2\.822\.03±\\pm0\.4055\.00Ours \(RL\)10\.84±\\pm1\.992\.35±\\pm0\.5430\.00put both the alphabet soupand the cream cheesebox in the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]12\.45±\\pm2\.383\.45±\\pm0\.5565\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]12\.68±\\pm1\.763\.22±\\pm0\.4860\.00Ours \(PID\)11\.54±\\pm3\.623\.44±\\pm0\.6065\.00Ours \(RL\)12\.18±\\pm1\.903\.37±\\pm0\.3560\.00put both moka potson the stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]9\.86±\\pm1\.801\.08±\\pm0\.4040\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]9\.32±\\pm1\.910\.91±\\pm0\.3540\.00Ours \(PID\)9\.45±\\pm2\.551\.18±\\pm0\.3130\.00Ours \(RL\)9\.07±\\pm2\.240\.94±\\pm0\.3840\.00put the yellow andwhite mug in themicrowave and close itOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]8\.45±\\pm2\.451\.27±\\pm0\.3450\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]8\.00±\\pm3\.011\.66±\\pm1\.0320\.00Ours \(PID\)8\.65±\\pm1\.741\.36±\\pm1\.3765\.00Ours \(RL\)9\.75±\\pm1\.181\.41±\\pm1\.4045\.00

Table 18:Detailed task\-wise results on Speed intervention for LIBERO GOAL tasksuiteTaskMethodSpeed \(cm/s\)SATSR \(%\)Open the middle layerof the drawerOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]7\.59±\\pm2\.441\.68±\\pm2\.1860\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]8\.50±\\pm2\.130\.79±\\pm1\.7380\.00Ours \(PID\)7\.12±\\pm1\.861\.54±\\pm1\.9460\.00Ours \(RL\)8\.20±\\pm2\.280\.20±\\pm0\.5775\.00Put the bowl onthe stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]17\.47±\\pm3\.663\.40±\\pm0\.39100\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]17\.95±\\pm1\.713\.45±\\pm0\.3995\.00Ours \(PID\)18\.47±\\pm1\.603\.45±\\pm0\.3595\.00Ours \(RL\)18\.03±\\pm1\.973\.50±\\pm0\.5890\.00Put the wine bottleon the top of the drawerOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]14\.64±\\pm3\.405\.82±\\pm0\.6280\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.30±\\pm2\.485\.79±\\pm0\.7980\.00Ours \(PID\)12\.84±\\pm3\.715\.52±\\pm0\.5675\.00Ours \(RL\)13\.87±\\pm4\.995\.59±\\pm0\.4585\.00Open the top layerof the drawer and putthe bowl insideOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]14\.60±\\pm1\.642\.79±\\pm0\.4150\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]15\.64±\\pm1\.232\.90±\\pm0\.2850\.00Ours \(PID\)12\.97±\\pm4\.712\.94±\\pm0\.3645\.00Ours \(RL\)14\.26±\\pm1\.522\.89±\\pm0\.4655\.00Put the bowl onthe top of the drawerOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]18\.63±\\pm1\.914\.59±\\pm0\.2275\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]18\.60±\\pm1\.974\.79±\\pm0\.31100\.00Ours \(PID\)18\.25±\\pm4\.204\.75±\\pm0\.3595\.00Ours \(RL\)17\.71±\\pm3\.964\.66±\\pm0\.3195\.00Push the plate tothe front of the stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]11\.43±\\pm1\.783\.35±\\pm0\.3980\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]12\.19±\\pm1\.013\.12±\\pm0\.4685\.00Ours \(PID\)12\.13±\\pm1\.113\.21±\\pm0\.4185\.00Ours \(RL\)11\.96±\\pm1\.473\.23±\\pm0\.39100\.00Put the cream cheeseon the bowlOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]14\.45±\\pm3\.713\.44±\\pm0\.5475\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.60±\\pm4\.633\.75±\\pm0\.3650\.00Ours \(PID\)12\.82±\\pm4\.803\.70±\\pm0\.5260\.00Ours \(RL\)15\.18±\\pm3\.523\.74±\\pm0\.3975\.00Turn on the stoveOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]11\.24±\\pm2\.893\.60±\\pm0\.5595\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]11\.37±\\pm1\.413\.64±\\pm0\.57100\.00Ours \(PID\)11\.46±\\pm1\.633\.53±\\pm0\.79100\.00Ours \(RL\)11\.75±\\pm2\.653\.49±\\pm0\.74100\.00Put the bowl onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]18\.16±\\pm1\.053\.55±\\pm0\.5195\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]17\.16±\\pm3\.543\.45±\\pm0\.3795\.00Ours \(PID\)17\.92±\\pm0\.963\.35±\\pm0\.4790\.00Ours \(RL\)18\.23±\\pm1\.043\.69±\\pm0\.4390\.00Put the wine bottleon the rackOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]8\.85±\\pm3\.703\.60±\\pm0\.5565\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]9\.77±\\pm1\.453\.69±\\pm0\.5970\.00Ours \(PID\)9\.34±\\pm1\.583\.41±\\pm0\.7855\.00Ours \(RL\)9\.53±\\pm2\.463\.13±\\pm0\.8665\.00

Table 19:Detailed task\-wise results on Speed intervention for LIBERO OBJECT tasksuiteTaskMethodSpeed \(cm/s\)SATSR \(%\)Pick the alphabetsoup and place itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]14\.05±\\pm2\.542\.16±\\pm0\.3190\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]16\.61±\\pm1\.482\.23±\\pm0\.2485\.00Ours \(PID\)15\.87±\\pm1\.422\.20±\\pm0\.2985\.00Ours \(RL\)14\.55±\\pm2\.702\.22±\\pm0\.3270\.00Pick up the creamcheese andplace itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]13\.93±\\pm2\.062\.30±\\pm0\.3170\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]13\.08±\\pm1\.852\.39±\\pm0\.2660\.00Ours \(PID\)13\.11±\\pm3\.212\.40±\\pm0\.2160\.00Ours \(RL\)13\.72±\\pm2\.832\.48±\\pm0\.2675\.00Pick up the saladdressing andplace itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]13\.87±\\pm2\.961\.94±\\pm0\.1875\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.68±\\pm1\.192\.11±\\pm0\.2475\.00Ours \(PID\)13\.77±\\pm2\.042\.00±\\pm0\.2285\.00Ours \(RL\)14\.66±\\pm1\.251\.92±\\pm0\.3475\.00Pick up the bbqsauce and place it inthe basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]13\.02±\\pm1\.151\.86±\\pm0\.3440\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]11\.37±\\pm3\.311\.88±\\pm0\.3260\.00Ours \(PID\)11\.26±\\pm2\.921\.75±\\pm0\.4040\.00Ours \(RL\)11\.82±\\pm3\.221\.63±\\pm0\.3565\.00Pick up the ketchup andplace it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]15\.01±\\pm2\.201\.92±\\pm0\.2890\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]15\.20±\\pm2\.302\.10±\\pm0\.1765\.00Ours \(PID\)14\.91±\\pm2\.182\.04±\\pm0\.3590\.00Ours \(RL\)13\.62±\\pm3\.612\.07±\\pm0\.2695\.00Pick up the tomatosauce and place itin the basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]12\.17±\\pm2\.322\.07±\\pm0\.3075\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]13\.28±\\pm2\.752\.11±\\pm0\.29100\.00Ours \(PID\)11\.90±\\pm2\.532\.20±\\pm0\.4885\.00Ours \(RL\)12\.08±\\pm2\.471\.89±\\pm0\.4870\.00Pick up the butterand place it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]16\.24±\\pm1\.392\.57±\\pm0\.3360\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]16\.52±\\pm1\.953\.02±\\pm0\.1770\.00Ours \(PID\)16\.84±\\pm1\.342\.82±\\pm0\.3855\.00Ours \(RL\)16\.55±\\pm0\.932\.79±\\pm0\.2080\.00Pick up the milk andplace it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]15\.79±\\pm3\.062\.45±\\pm0\.3170\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]16\.78±\\pm0\.812\.52±\\pm0\.2885\.00Ours \(PID\)16\.72±\\pm1\.522\.52±\\pm0\.25100\.00Ours \(RL\)15\.40±\\pm3\.282\.46±\\pm0\.2395\.00Pick up the pudding andplace it in thebasketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]16\.23±\\pm1\.752\.42±\\pm0\.1760\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]15\.98±\\pm2\.042\.50±\\pm0\.1990\.00Ours \(PID\)16\.60±\\pm1\.262\.40±\\pm0\.1965\.00Ours \(RL\)15\.10±\\pm2\.062\.49±\\pm0\.2350\.00Pick up the orangejuice and place it inthe basketOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]11\.86±\\pm4\.751\.73±\\pm0\.2590\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.36±\\pm2\.571\.80±\\pm0\.2195\.00Ours \(PID\)14\.32±\\pm1\.691\.70±\\pm0\.28100\.00Ours \(RL\)13\.31±\\pm3\.761\.77±\\pm0\.3090\.00

Table 20:Detailed task\-wise results on Speed intervention for LIBERO SPATIAL tasksuiteTaskMethodSpeed \(cm/s\)SATSR \(%\)Pick the akita black bowlbetween the plate and theramekin and place it onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]17\.73±\\pm2\.304\.33±\\pm0\.3375\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]17\.96±\\pm2\.594\.32±\\pm0\.3290\.00Ours \(PID\)18\.60±\\pm1\.134\.30±\\pm0\.3595\.00Ours \(RL\)18\.10±\\pm3\.694\.44±\\pm0\.3680\.00Pick the akita black bowl nextto the ramekin and placeit on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]18\.57±\\pm1\.283\.46±\\pm0\.6195\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]18\.88±\\pm2\.023\.63±\\pm0\.4090\.00Ours \(PID\)18\.03±\\pm3\.033\.47±\\pm0\.3495\.00Ours \(RL\)18\.93±\\pm1\.273\.44±\\pm0\.3290\.00Pick the akita black bowl fromtable center and place iton the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]14\.08±\\pm3\.452\.55±\\pm0\.3095\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]14\.71±\\pm2\.302\.45±\\pm0\.3865\.00Ours \(PID\)14\.66±\\pm3\.372\.61±\\pm0\.3375\.00Ours \(RL\)14\.55±\\pm3\.352\.50±\\pm0\.3685\.00Pick the akita black bowl nextto the cookies box andplace it on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]19\.49±\\pm1\.243\.89±\\pm0\.3485\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]18\.49±\\pm3\.243\.88±\\pm0\.2795\.00Ours \(PID\)19\.22±\\pm1\.974\.02±\\pm0\.34100\.00Ours \(RL\)19\.38±\\pm1\.373\.94±\\pm0\.29100\.00Pick the akita black bowl inthe top layer of the woodencabinet and place it onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]17\.24±\\pm0\.553\.39±\\pm0\.5070\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]15\.63±\\pm4\.663\.22±\\pm0\.2875\.00Ours \(PID\)16\.37±\\pm2\.673\.44±\\pm0\.4675\.00Ours \(RL\)17\.32±\\pm0\.593\.25±\\pm0\.4165\.00Pick the akita black bowl onthe ramekin and place it onthe plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]18\.29±\\pm2\.913\.48±\\pm0\.3840\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]17\.59±\\pm2\.233\.45±\\pm0\.3645\.00Ours \(PID\)18\.32±\\pm2\.093\.29±\\pm0\.5050\.00Ours \(RL\)17\.18±\\pm3\.213\.30±\\pm0\.5350\.00Pick the akita black bowl onthe cookies box and placeit on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]17\.87±\\pm1\.173\.30±\\pm0\.3595\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]18\.19±\\pm0\.853\.30±\\pm0\.2895\.00Ours \(PID\)17\.99±\\pm1\.123\.30±\\pm0\.2685\.00Ours \(RL\)18\.04±\\pm1\.493\.31±\\pm0\.34100\.00Pick the akita black bowlon the stove and place iton the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]15\.35±\\pm2\.983\.44±\\pm0\.6070\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]15\.55±\\pm0\.932\.90±\\pm0\.8965\.00Ours \(PID\)16\.01±\\pm0\.822\.92±\\pm0\.9070\.00Ours \(RL\)15\.54±\\pm1\.983\.28±\\pm0\.8265\.00Pick the akita black bowlnext to the plate andplace it on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]17\.64±\\pm1\.553\.37±\\pm0\.6890\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]16\.30±\\pm3\.303\.48±\\pm0\.5870\.00Ours \(PID\)16\.75±\\pm2\.483\.37±\\pm0\.6075\.00Ours \(RL\)17\.49±\\pm2\.603\.48±\\pm0\.7475\.00Pick the akita black bowl onthe wooden cabinet and placeit on the plateOpenVLA\[[18](https://arxiv.org/html/2606.00269#bib.bib17)\]19\.67±\\pm2\.114\.97±\\pm0\.3365\.00STATIC\[[12](https://arxiv.org/html/2606.00269#bib.bib3)\]19\.81±\\pm1\.114\.94±\\pm0\.3575\.00Ours \(PID\)20\.08±\\pm0\.614\.93±\\pm0\.2360\.00Ours \(RL\)18\.83±\\pm3\.165\.04±\\pm0\.3675\.00
Closed-Loop Neural Activation Control in Vision-Language-Action Models

Similar Articles

UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering

Controlling Tool Use with Heading-Specific Activation Steering

Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

InSight: Self-Guided Skill Acquisition via Steerable VLAs

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Submit Feedback

Similar Articles

UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering
Controlling Tool Use with Heading-Specific Activation Steering
Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention
InSight: Self-Guided Skill Acquisition via Steerable VLAs
Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions