Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders
Summary
This paper applies sparse autoencoders to the CosyVoice3 text-to-speech language model, discovering interpretable features that can be steered to control attributes like laughter, speaker gender, and speech rate while preserving content.
View Cached Full Text
Cached at: 06/10/26, 09:43 AM
Paper page - Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders
Source: https://huggingface.co/papers/2606.10029
Abstract
Sparse autoencoders trained on language model representations reveal interpretable features for speech synthesis that can be manipulated to control linguistic and prosodic attributes.
Language modelsincreasingly serve as the backbone oftext-to-speech(TTS) systems, yet we understand little about the representations they build when text and generated speech tokens share a singleresidual stream. We train BatchTopKsparse autoencoderson the LM backbone of CosyVoice3 and introduce a modality-awareauto-interp pipelinethat labels each feature from where it fires-text-prefix context, 1-second speech clips, or both. The recovered features are interpretable, spanningphonemes,laughter,accent promptsandspeaker gender. Steering through the SAElatent spaceshows these features are causal rather than merely descriptive: targeted interventions raiselaughterprobability from 0.02 to 0.79, flip perceivedspeaker gender, and controlspeech ratewhile preserving spoken content. SAE features thus serve both as interpretability objects and as control directions for TTS synthesis.
View arXiv pageView PDFAdd to collection
Community
Paper submitter
Bringing SAEs to text-to-speech models!
Currently, control over TTS models such as CosyVoice3 is limited to prompts or predefined tags. We found that model generations can be precisely edited by steering SAE features.
We also analyze these features: some are audio-only, others activate only on text, and some activate on both text and audio. Additionally, we introduce an autointerp pipeline for all of them.
We plan to publish the SAE weights and code soon!
Upload images, audio, and videos by dragging in the text input, pasting, orclicking here.
Tap or paste here to upload images
Get this paper in your agent:
hf papers read 2606\.10029
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.10029 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.10029 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.10029 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Interpreting Brain Responses to Language with Sparse Features from Language Models
This paper introduces Augmented Sparse Encoding Models to interpret brain responses to language using sparse features from language models, validated on high-field 7T fMRI data. It recovers known neural tuning properties and discovers a new voxel population tuned to people-related content.
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
This paper demonstrates that sparse autoencoders can extract interpretable features from Claude 3 Sonnet, a production-scale language model, addressing scalability concerns for dictionary learning. The features are multilingual, multimodal, and include safety-relevant concepts like deception and sycophancy, with causal influence on model outputs.
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
This paper applies TopK Sparse Autoencoders to three EEG foundation models (SleepFM, REVE, LaBraM) to extract interpretable feature dictionaries and introduces a framework for concept steering, revealing representational failures and clinical entanglements.
Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection
This paper introduces a principled approach to multilingual language steering using sparse autoencoders (SAEs) trained on multilingual data and a novel layer selection rule based on the intersection of multilingual alignment and language separability, evaluated on LLaMA-3.1-8B and Gemma-2-9B for machine translation and cross-lingual summarization.
How Quantization Changes Interpretable Features: A Sparse Autoencoder Analysis of Language Models
This paper investigates whether interpretable features identified by sparse autoencoders in full-precision language models remain faithful after quantization, finding systematic degradation that behavioral metrics like perplexity can miss.