Tag
This paper applies sparse autoencoders to the CosyVoice3 text-to-speech language model, discovering interpretable features that can be steered to control attributes like laughter, speaker gender, and speech rate while preserving content.
Proposes CTRL-STEER, a closed-loop framework for adaptive steering of vision-language-action models using time-varying control signals, achieving better trade-off between concept regulation and task success without retraining.
SteER is a framework for steerable deep research that introduces mid-process, interpretable control via adaptive pause decisions and live persona modeling, outperforming baselines by up to 22.80% on alignment and preferred by human readers in over 85% of pairwise alignment judgments.
This paper introduces a principled approach to multilingual language steering using sparse autoencoders (SAEs) trained on multilingual data and a novel layer selection rule based on the intersection of multilingual alignment and language separability, evaluated on LLaMA-3.1-8B and Gemma-2-9B for machine translation and cross-lingual summarization.
This tweet thread discusses best practices for using the Codex coding agent, focusing on durable threads, voice input, steering, queuing, and its expanding capabilities beyond code generation to full computer workflow automation.
Nous Research released Contrastive Neuron Attribution (CNA), a method to steer LLM behavior by identifying and ablating sparse circuits in MLP neurons without training sparse autoencoders or degrading general benchmarks, validated on multiple large language models.
Jason Liu shares how he uses OpenAI's Codex for knowledge work beyond coding, leveraging durable threads, voice input, and steering to integrate coding agents into his broader workflow.
Jason Liu shares his workflow primitives for using Codex effectively, including durable threads, voice input, and steering to extend AI agents beyond coding into knowledge work.
The article explores how DeepSeek-V4-Flash, a powerful local model, makes LLM steering practical again, discussing the concept and its implementation in the DwarfStar 4 project by antirez.
This paper introduces a general formulation of non-linear intervention for large language models, extending beyond the Linear Representation Hypothesis to manipulate features encoded along non-linear manifolds, and validates the approach on refusal bypass steering.
This paper investigates how large language models process emotional valence through mechanistic interpretability. Using activation patching and steering on three open-source LLMs, the authors find that negative valence is localized to early layers while positive valence peaks in mid-to-late layers, and they validate this through topic-controlled flip tests.