steering

#steering

A Coin Flip Per Token: Bernoulli Sparse Steering of Large Language Models

arXiv cs.LG ↗ · 17h ago Cached

Introduces Stochastic Token Steering (STS) and Stochastic Block Steering (SBS) for LLM activation steering, which probabilistically gate steering signals per token or per sequence. Shows that steering only 50% of tokens recovers most of the dense-steering effect while preserving fluency, and that the behavioral outcome is rate-limited by cumulative signal dosage.

0 favorites 0 likes

#steering

Can Dialects Be Steered Like Languages? Sparse Neurons and Distributed Directions in Arabic LLMs

arXiv cs.CL ↗ · yesterday Cached

This paper investigates methods to steer Arabic LLMs toward dialect-specific generation by identifying sparse neuron populations and extracting dialect activation directions, enabling dialect control at inference time without fine-tuning.

0 favorites 0 likes

#steering

They Infer What You Meant: Models Represent Communicative Intent More Reliably Than They Act On It

arXiv cs.CL ↗ · yesterday Cached

This paper studies language models' failure to act on communicative intent despite robust internal representations. Using linear probes, the authors show intent is decodable from hidden states but often not reflected in outputs, and steering a late-layer direction can recover the intended behavior.

0 favorites 0 likes

#steering

Mechanistic Personality Analysis of LLMs Steering Personality via Latent Feature Interventions

arXiv cs.AI ↗ · 2026-06-30 Cached

This paper introduces a mechanistic interpretability approach to steer LLM personality traits by identifying and intervening on latent features using sparse autoencoders, achieving controllable personality modulation while maintaining language performance.

0 favorites 0 likes

#steering

Getting an LLM agent to actually stay in character, the steering bullseye nobody writes down

Reddit r/AI_Agents ↗ · 2026-06-28

A discussion on techniques for keeping LLM agents consistently in character, highlighting an often overlooked aspect of steering.

0 favorites 0 likes

#steering

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

This paper applies sparse autoencoders to the CosyVoice3 text-to-speech language model, discovering interpretable features that can be steered to control attributes like laughter, speaker gender, and speech rate while preserving content.

0 favorites 0 likes

#steering

Closed-Loop Neural Activation Control in Vision-Language-Action Models

arXiv cs.AI ↗ · 2026-06-02 Cached

Proposes CTRL-STEER, a closed-loop framework for adaptive steering of vision-language-action models using time-varying control signals, achieving better trade-off between concept regulation and task success without retraining.

0 favorites 0 likes

#steering

An Interactive Paradigm for Deep Research

arXiv cs.CL ↗ · 2026-05-26 Cached

SteER is a framework for steerable deep research that introduces mid-process, interpretable control via adaptive pause decisions and live persona modeling, outperforming baselines by up to 22.80% on alignment and preferred by human readers in over 85% of pairwise alignment judgments.

0 favorites 0 likes

#steering

Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper introduces a principled approach to multilingual language steering using sparse autoencoders (SAEs) trained on multilingual data and a novel layer selection rule based on the intersection of multilingual alignment and language separability, evaluated on LLaMA-3.1-8B and Gemma-2-9B for machine translation and cross-lingual summarization.

0 favorites 0 likes

#steering

@jxnlco: https://x.com/jxnlco/status/2057153744630890620

X AI KOLs Following ↗ · 2026-05-20 Cached

This tweet thread discusses best practices for using the Codex coding agent, focusing on durable threads, voice input, steering, queuing, and its expanding capabilities beyond code generation to full computer workflow automation.

0 favorites 0 likes

#steering

@NousResearch: To check that CNA isolates only the intended behavior, we evaluate steered models on MMLU across a range of steering st…

X AI KOLs Following ↗ · 2026-05-19 Cached

Nous Research released Contrastive Neuron Attribution (CNA), a method to steer LLM behavior by identifying and ablating sparse circuits in MLP neurons without training sparse autoencoders or degrading general benchmarks, validated on multiple large language models.

0 favorites 0 likes

#steering

Codex-maxxing

Hacker News Top ↗ · 2026-05-19 Cached

Jason Liu shares how he uses OpenAI's Codex for knowledge work beyond coding, leveraging durable threads, voice input, and steering to integrate coding agents into his broader workflow.

0 favorites 0 likes

#steering

@jxnlco: jason from the codex team here, heres a draft on codex maxxing and the primatives i use on a daily basis https://jxnl.g…

X AI KOLs Following ↗ · 2026-05-17 Cached

Jason Liu shares his workflow primitives for using Codex effectively, including durable threads, voice input, and steering to extend AI agents beyond coding into knowledge work.

0 favorites 0 likes

#steering

DeepSeek-V4-Flash means LLM steering is interesting again

Hacker News Top ↗ · 2026-05-16 Cached

The article explores how DeepSeek-V4-Flash, a powerful local model, makes LLM steering practical again, discussing the concept and its implementation in the DwarfStar 4 project by antirez.

0 favorites 0 likes

#steering

Non-linear Interventions on Large Language Models

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper introduces a general formulation of non-linear intervention for large language models, extending beyond the Linear Representation Hypothesis to manipulate features encoded along non-linear manifolds, and validates the approach on refusal bypass steering.

0 favorites 0 likes

#steering

Negative Before Positive: Asymmetric Valence Processing in Large Language Models

arXiv cs.CL ↗ · 2026-05-08 Cached

This paper investigates how large language models process emotional valence through mechanistic interpretability. Using activation patching and steering on three open-source LLMs, the authors find that negative valence is localized to early layers while positive valence peaks in mid-to-late layers, and they validate this through topic-controlled flip tests.

0 favorites 0 likes

steering

Submit Feedback