Tag
This paper applies sparse autoencoders to the CosyVoice3 text-to-speech language model, discovering interpretable features that can be steered to control attributes like laughter, speaker gender, and speech rate while preserving content.