@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!

X AI KOLs Timeline 05/12/26, 01:25 AM Models

text-to-speech open-source voice-cloning multilingual diffusion-model ai-voice

Summary

Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.

Open-source TTS is going crazy! Are scammers in industrial parks getting new weapons? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention! 30 languages + 9 Chinese dialects Create voices from thin air using natural language descriptions Ultimate cloning mode: Replicate breaths, verbal tics, and emotions Real-time rate of 0.13 on RTX 4090, almost zero latency GitHub stars have exceeded 10,000, Apache 2.0 license friendly for commercial use! Podcasters, audiobook narrators, and short video creators can take off immediately https://github.com/OpenBMB/VoxCPM

Original Article

View Cached Full Text

Cached at: 05/12/26, 07:35 AM

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

English | Chinese

👋 Join our community for discussion and support! Feishu | Discord

ModelBest THUHCSI

Similar Articles

@FakeMaidenMaker: Explosive! This open-source project converts text to human-like voice for free, can clone anyone's voice, and adjust timbre with text! GitHub has garnered 30K stars, from Mianbao Intelligent OpenBMB, VoxCPM previously topped both GitHub and HuggingFace charts. Do...

X AI KOLs Timeline

VoxCPM2 is an open-source speech synthesis model from OpenBMB, using a tokenizer-free diffusion autoregressive architecture, supporting 30 languages, voice design, and controllable voice cloning. It can clone a voice with just one sentence, or create a brand new voice using text, outputting 48kHz high-quality audio, and is commercially usable.

@QT9277: "No way, AI voice synthesis has gotten this insane???" I was browsing GitHub today and was completely stunned. VoxCPM2, trending #1, over 20k stars, blowing up overseas. I thought it was another PPT open-source project, but after carefully checking the demo—my ears really couldn't tell which one was real. …

X AI KOLs Timeline

Introducing VoxCPM2, a completely free for commercial use, open-source multilingual voice synthesis model supporting voice design, cloning, and 48kHz high-quality output, ranked #1 on GitHub trending.

OpenBMB/VoxCPM

GitHub Trending (daily)

OpenBMB releases VoxCPM2, a 2B-parameter tokenizer-free TTS model trained on 2M+ hours of multilingual speech data, supporting 30 languages, voice design, controllable cloning, and 48kHz output.

@hisevenih: The AI voice community is blown away. This GitHub open-source black tech takes AI voice to an insane level, truly achieving: one sentence, one voice. Remember this project name: VoxCPM2. It has already gained 20K stars on GitHub. Most incredibly, it doesn't even need a reference audio…

X AI KOLs Timeline

GitHub open-source project VoxCPM2 achieves AI voice cloning without reference audio, generating target voice precisely with just one sentence, has gained 20K stars.

@billtheinvestor: Shanghai Jiao Tong University open-sources F5-TTS speech generation model. The model is trained on 100,000 hours of data and supports bilingual synthesis in Chinese and English. Technical features include zero-shot voice cloning, total-duration-based speed control, emotion expression control, and long text synthesis. Commercial use is allowed.

X AI KOLs Timeline

Shanghai Jiao Tong University has open-sourced the F5-TTS speech generation model, trained on 100,000 hours of data, supporting bilingual synthesis in Chinese and English and zero-shot voice cloning, and allowing commercial use.

Similar Articles

@FakeMaidenMaker: Explosive! This open-source project converts text to human-like voice for free, can clone anyone's voice, and adjust timbre with text! GitHub has garnered 30K stars, from Mianbao Intelligent OpenBMB, VoxCPM previously topped both GitHub and HuggingFace charts. Do...

OpenBMB/VoxCPM

Submit Feedback