@FakeMaidenMaker: Explosive! This open-source project converts text to human-like voice for free, can clone anyone's voice, and adjust timbre with text! GitHub has garnered 30K stars, from Mianbao Intelligent OpenBMB, VoxCPM previously topped both GitHub and HuggingFace charts. Do...
Summary
VoxCPM2 is an open-source speech synthesis model from OpenBMB, using a tokenizer-free diffusion autoregressive architecture, supporting 30 languages, voice design, and controllable voice cloning. It can clone a voice with just one sentence, or create a brand new voice using text, outputting 48kHz high-quality audio, and is commercially usable.
View Cached Full Text
Cached at: 06/18/26, 02:06 AM
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
English | Chinese
👋 Join our community for discussion and support! Feishu | Discord
Similar Articles
@QT9277: "No way, AI voice synthesis has gotten this insane???" I was browsing GitHub today and was completely stunned. VoxCPM2, trending #1, over 20k stars, blowing up overseas. I thought it was another PPT open-source project, but after carefully checking the demo—my ears really couldn't tell which one was real. …
Introducing VoxCPM2, a completely free for commercial use, open-source multilingual voice synthesis model supporting voice design, cloning, and 48kHz high-quality output, ranked #1 on GitHub trending.
@hisevenih: The AI voice community is blown away. This GitHub open-source black tech takes AI voice to an insane level, truly achieving: one sentence, one voice. Remember this project name: VoxCPM2. It has already gained 20K stars on GitHub. Most incredibly, it doesn't even need a reference audio…
GitHub open-source project VoxCPM2 achieves AI voice cloning without reference audio, generating target voice precisely with just one sentence, has gained 20K stars.
@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!
Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.
@Chenzeze777: Found an open-source voice synthesis model that I just had to share. 2 billion parameters, trained on 2 million hours of data, supports 30 languages plus 9 Chinese dialects—just input text and it synthesizes speech, including Sichuanese, Cantonese, and Northeastern dialects. The craziest part? Use natural language to describe a voice—like "young female, gentle and sweet"—and it creates a brand-new voice from scratch without needing any reference audio.
Introducing an open-source voice synthesis model with 2 billion parameters and 2 million hours of training. It supports 30 languages and 9 Chinese dialects, allows voice description via natural language, can clone voices from a 3-second recording, delivers 48kHz studio-quality audio, and is free for commercial use under the Apache-2.0 license.
@denziideng: Another AI voice cloning 'dimensional reduction attack'... The CosyVoice I shared before can clone in 3 seconds, which I thought was already scary enough. But today's tool is even more lethal — after casually recording 1 minute of my own voice for training, it directly replicates tone, mannerisms, emotions, breathing, and pauses. It's almost like the soul of the original person possessed it! C...
GPT-SoVITS is an open-source AI voice cloning tool that supports zero-shot (5-second voice) and few-shot (1-minute training) high-fidelity voice cloning, cross-lingual inference, and comes with a complete WebUI toolchain. It has garnered 57.8k stars on GitHub, becoming the leading open-source project in the voice cloning field.