@QT9277: "No way, AI voice synthesis has gotten this insane???" I was browsing GitHub today and was completely stunned. VoxCPM2, trending #1, over 20k stars, blowing up overseas. I thought it was another PPT open-source project, but after carefully checking the demo—my ears really couldn't tell which one was real. …
Summary
Introducing VoxCPM2, a completely free for commercial use, open-source multilingual voice synthesis model supporting voice design, cloning, and 48kHz high-quality output, ranked #1 on GitHub trending.
View Cached Full Text
Cached at: 06/05/26, 05:17 PM
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
English | Chinese
👋 Join our community for discussion and support! Feishu | Discord
ModelBest THUHCSI
Similar Articles
@hisevenih: The AI voice community is blown away. This GitHub open-source black tech takes AI voice to an insane level, truly achieving: one sentence, one voice. Remember this project name: VoxCPM2. It has already gained 20K stars on GitHub. Most incredibly, it doesn't even need a reference audio…
GitHub open-source project VoxCPM2 achieves AI voice cloning without reference audio, generating target voice precisely with just one sentence, has gained 20K stars.
@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!
Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.
@denziideng: Another AI voice cloning 'dimensional reduction attack'... The CosyVoice I shared before can clone in 3 seconds, which I thought was already scary enough. But today's tool is even more lethal — after casually recording 1 minute of my own voice for training, it directly replicates tone, mannerisms, emotions, breathing, and pauses. It's almost like the soul of the original person possessed it! C...
GPT-SoVITS is an open-source AI voice cloning tool that supports zero-shot (5-second voice) and few-shot (1-minute training) high-fidelity voice cloning, cross-lingual inference, and comes with a complete WebUI toolchain. It has garnered 57.8k stars on GitHub, becoming the leading open-source project in the voice cloning field.
@Chenzeze777: Found an open-source voice synthesis model that I just had to share. 2 billion parameters, trained on 2 million hours of data, supports 30 languages plus 9 Chinese dialects—just input text and it synthesizes speech, including Sichuanese, Cantonese, and Northeastern dialects. The craziest part? Use natural language to describe a voice—like "young female, gentle and sweet"—and it creates a brand-new voice from scratch without needing any reference audio.
Introducing an open-source voice synthesis model with 2 billion parameters and 2 million hours of training. It supports 30 languages and 9 Chinese dialects, allows voice description via natural language, can clone voices from a 3-second recording, delivers 48kHz studio-quality audio, and is free for commercial use under the Apache-2.0 license.
@laowangbabababa: Shocked! Dr. Qi on Douyin sells a 500k digital human agent per day, and I built it in 2 minutes. Using the Pixelle-Video project, which already has 22k stars. It supports digital human lip-syncing, motion transfer, and image-to-video. Supports ComfyUI, input a topic, from script writing to adding...
Introducing the open-source project Pixelle-Video: a fully automated AI short video engine. Input a topic and it automatically generates a video with script, images, voiceover, and background music. Supports local and cloud models, modular design allows flexible replacement of each component model.