cosyvoice3

#cosyvoice3

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

This paper applies sparse autoencoders to the CosyVoice3 text-to-speech language model, discovering interpretable features that can be steered to control attributes like laughter, speaker gender, and speech rate while preserving content.

0 favorites 0 likes

cosyvoice3

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Submit Feedback