@billtheinvestor: Shanghai Jiao Tong University open-sources F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports bilingual synthesis in Chinese and English. Technical features include zero-shot voice cloning, total-duration-based speed control, emotion expression control, and long text synthesis. Commercial use is allowed.

X AI KOLs Timeline Models

Summary

Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.

Shanghai Jiao Tong University open-sources F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports bilingual synthesis in Chinese and English. Technical features include zero-shot voice cloning, total-duration-based speed control, emotion expression control, and long text synthesis. Commercial use is allowed. https://t.co/G8rSolPdVh
Original Article
View Cached Full Text

Cached at: 05/08/26, 09:53 AM

Shanghai Jiao Tong University open-sources the F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports Chinese-English multilingual synthesis. Technical features include zero-shot voice cloning, duration-based speed control, emotion expression control, and long text synthesis. Supports commercial use. https://t.co/G8rSolPdVh

Similar Articles

@Chenzeze777: Found an open-source voice synthesis model that I just had to share. 2 billion parameters, trained on 2 million hours of data, supports 30 languages plus 9 Chinese dialects—just input text and it synthesizes speech, including Sichuanese, Cantonese, and Northeastern dialects. The craziest part? Use natural language to describe a voice—like "young female, gentle and sweet"—and it creates a brand-new voice from scratch without needing any reference audio.

X AI KOLs Timeline

Introducing an open-source voice synthesis model with 2 billion parameters and 2 million hours of training. It supports 30 languages and 9 Chinese dialects, allows voice description via natural language, can clone voices from a 3-second recording, delivers 48kHz studio-quality audio, and is free for commercial use under the Apache-2.0 license.

@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!

X AI KOLs Timeline

Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.

@LinearUncle: Recommending an open-source voice cloning repository from a Chinese company called Mosi: MOSS-TTS. You read a passage, it clones your voice, then you can use your voice to read any text. Check the post details to see how I used it in practice—it works great and can be indistinguishable from the real thing. https://github.com/OpenMOS…

X AI KOLs Timeline

MOSS-TTS is an open-source voice cloning model introduced by Mosi Company. Users can clone a voice by reading a small amount of text, and then use the cloned voice to generate any speech with realistic results.