steering

Tag

Cards List
#steering

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Hugging Face Daily Papers · 2026-06-08 Cached

This paper applies sparse autoencoders to the CosyVoice3 text-to-speech language model, discovering interpretable features that can be steered to control attributes like laughter, speaker gender, and speech rate while preserving content.

0 favorites 0 likes
#steering

Closed-Loop Neural Activation Control in Vision-Language-Action Models

arXiv cs.AI · 2026-06-02 Cached

Proposes CTRL-STEER, a closed-loop framework for adaptive steering of vision-language-action models using time-varying control signals, achieving better trade-off between concept regulation and task success without retraining.

0 favorites 0 likes
#steering

An Interactive Paradigm for Deep Research

arXiv cs.CL · 2026-05-26 Cached

SteER is a framework for steerable deep research that introduces mid-process, interpretable control via adaptive pause decisions and live persona modeling, outperforming baselines by up to 22.80% on alignment and preferred by human readers in over 85% of pairwise alignment judgments.

0 favorites 0 likes
#steering

Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection

arXiv cs.CL · 2026-05-25 Cached

This paper introduces a principled approach to multilingual language steering using sparse autoencoders (SAEs) trained on multilingual data and a novel layer selection rule based on the intersection of multilingual alignment and language separability, evaluated on LLaMA-3.1-8B and Gemma-2-9B for machine translation and cross-lingual summarization.

0 favorites 0 likes
#steering

@jxnlco: https://x.com/jxnlco/status/2057153744630890620

X AI KOLs Following · 2026-05-20 Cached

This tweet thread discusses best practices for using the Codex coding agent, focusing on durable threads, voice input, steering, queuing, and its expanding capabilities beyond code generation to full computer workflow automation.

0 favorites 0 likes
#steering

@NousResearch: To check that CNA isolates only the intended behavior, we evaluate steered models on MMLU across a range of steering st…

X AI KOLs Following · 2026-05-19 Cached

Nous Research released Contrastive Neuron Attribution (CNA), a method to steer LLM behavior by identifying and ablating sparse circuits in MLP neurons without training sparse autoencoders or degrading general benchmarks, validated on multiple large language models.

0 favorites 0 likes
#steering

Codex-maxxing

Hacker News Top · 2026-05-19 Cached

Jason Liu shares how he uses OpenAI's Codex for knowledge work beyond coding, leveraging durable threads, voice input, and steering to integrate coding agents into his broader workflow.

0 favorites 0 likes
#steering

@jxnlco: jason from the codex team here, heres a draft on codex maxxing and the primatives i use on a daily basis https://jxnl.g…

X AI KOLs Following · 2026-05-17 Cached

Jason Liu shares his workflow primitives for using Codex effectively, including durable threads, voice input, and steering to extend AI agents beyond coding into knowledge work.

0 favorites 0 likes
#steering

DeepSeek-V4-Flash means LLM steering is interesting again

Hacker News Top · 2026-05-16 Cached

The article explores how DeepSeek-V4-Flash, a powerful local model, makes LLM steering practical again, discussing the concept and its implementation in the DwarfStar 4 project by antirez.

0 favorites 0 likes
#steering

Non-linear Interventions on Large Language Models

arXiv cs.CL · 2026-05-15 Cached

This paper introduces a general formulation of non-linear intervention for large language models, extending beyond the Linear Representation Hypothesis to manipulate features encoded along non-linear manifolds, and validates the approach on refusal bypass steering.

0 favorites 0 likes
#steering

Negative Before Positive: Asymmetric Valence Processing in Large Language Models

arXiv cs.CL · 2026-05-08 Cached

This paper investigates how large language models process emotional valence through mechanistic interpretability. Using activation patching and steering on three open-source LLMs, the authors find that negative valence is localized to early layers while positive valence peaks in mid-to-late layers, and they validate this through topic-controlled flip tests.

0 favorites 0 likes
← Back to home

Submit Feedback