@billtheinvestor: Shanghai Jiao Tong University open-sources F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports bilingual synthesis in Chinese and English. Technical features include zero-shot voice cloning, total-duration-based speed control, emotion expression control, and long text synthesis. Commercial use is allowed.
Summary
Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.
View Cached Full Text
Cached at: 05/08/26, 09:53 AM
Shanghai Jiao Tong University open-sources the F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports Chinese-English multilingual synthesis. Technical features include zero-shot voice cloning, duration-based speed control, emotion expression control, and long text synthesis. Supports commercial use. https://t.co/G8rSolPdVh
Similar Articles
@Chenzeze777: Found an open-source voice synthesis model that I just had to share. 2 billion parameters, trained on 2 million hours of data, supports 30 languages plus 9 Chinese dialects—just input text and it synthesizes speech, including Sichuanese, Cantonese, and Northeastern dialects. The craziest part? Use natural language to describe a voice—like "young female, gentle and sweet"—and it creates a brand-new voice from scratch without needing any reference audio.
Introducing an open-source voice synthesis model with 2 billion parameters and 2 million hours of training. It supports 30 languages and 9 Chinese dialects, allows voice description via natural language, can clone voices from a 3-second recording, delivers 48kHz studio-quality audio, and is free for commercial use under the Apache-2.0 license.
@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!
Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.
@Gorden_Sun: NetEase Youdao open-sources Confucius4-TTS, a 1.3B TTS model, supports multilingual, supports voice cloning, good results, very fast. Github: https://github.com/netease-youdao/Confucius4-TTS… Online demo: …
NetEase Youdao open-sourced the 1.3B parameter Confucius4-TTS model, supporting zero-shot voice cloning and cross-lingual speech synthesis in 14 languages, fast and with excellent results.
@LinearUncle: Recommending an open-source voice cloning repository from a Chinese company called Mosi: MOSS-TTS. You read a passage, it clones your voice, then you can use your voice to read any text. Check the post details to see how I used it in practice—it works great and can be indistinguishable from the real thing. https://github.com/OpenMOS…
MOSS-TTS is an open-source voice cloning model introduced by Mosi Company. Users can clone a voice by reading a small amount of text, and then use the cloned voice to generate any speech with realistic results.
@Gorden_Sun: ZONOS2: Open-source MoE TTS model. 8B total parameters, 0.9B activated parameters. Supports multilingual, voice cloning, Chinese, and Chinese results are good. Model:
Zyphra released ZONOS2, an open-source MoE text-to-speech model trained on over 6 million hours of multilingual speech, supporting voice cloning and high-quality synthesis across many languages.