@QT9277: "No way, AI voice synthesis has gotten this insane???" I was browsing GitHub today and was completely stunned. VoxCPM2, trending #1, over 20k stars, blowing up overseas. I thought it was another PPT open-source project, but after carefully checking the demo—my ears really couldn't tell which one was real. …

X AI KOLs Timeline Models

Summary

Introducing VoxCPM2, a completely free for commercial use, open-source multilingual voice synthesis model supporting voice design, cloning, and 48kHz high-quality output, ranked #1 on GitHub trending.

No way, AI voice synthesis has gotten this insane??? I was browsing GitHub today and was completely stunned. VoxCPM2, trending #1, over 20k stars, blowing up overseas. I thought it was another PPT open-source project, but after carefully checking the demo—my ears really couldn't tell which one was real. Let me tell you how insane this thing is: * Type and get voice output You type "calm female voice in her 30s" and it generates it instantly. No recording, no tuning, one sentence done. * Drop a recording, even replicate speech quirks Not that stiff robotic sound—it learns your tone, phrasing, even catchphrases. This isn't synthesis, it's cloning! * 48kHz studio-grade quality Sounds just like from a professional studio. I listened with headphones three times and still couldn't find any flaws. * The craziest part: completely free for commercial use Apache 2.0 license—use it, modify it, make money with it, all for zero cost. For someone like me climbing out of debt, this is a godsend! For those making short videos who don't want to show their face, those doing podcasts without equipment, those needing voiceovers for projects—zero cost, just hop on board, what more could you want? I've already starred it. Later I'll figure out how to integrate it into my content workflow. Anyone already playing with it? Comment and discuss! Pure personal sharing, not an ad, I just saw it. The craziest part: it's all free https://github.com/OpenBMB/VoxCPM
Original Article
View Cached Full Text

Cached at: 06/05/26, 05:17 PM

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

English | Chinese

👋 Join our community for discussion and support! Feishu | Discord

ModelBest THUHCSI

Similar Articles

@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!

X AI KOLs Timeline

Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.

@denziideng: Another AI voice cloning 'dimensional reduction attack'... The CosyVoice I shared before can clone in 3 seconds, which I thought was already scary enough. But today's tool is even more lethal — after casually recording 1 minute of my own voice for training, it directly replicates tone, mannerisms, emotions, breathing, and pauses. It's almost like the soul of the original person possessed it! C...

X AI KOLs Timeline

GPT-SoVITS is an open-source AI voice cloning tool that supports zero-shot (5-second voice) and few-shot (1-minute training) high-fidelity voice cloning, cross-lingual inference, and comes with a complete WebUI toolchain. It has garnered 57.8k stars on GitHub, becoming the leading open-source project in the voice cloning field.

@Chenzeze777: Found an open-source voice synthesis model that I just had to share. 2 billion parameters, trained on 2 million hours of data, supports 30 languages plus 9 Chinese dialects—just input text and it synthesizes speech, including Sichuanese, Cantonese, and Northeastern dialects. The craziest part? Use natural language to describe a voice—like "young female, gentle and sweet"—and it creates a brand-new voice from scratch without needing any reference audio.

X AI KOLs Timeline

Introducing an open-source voice synthesis model with 2 billion parameters and 2 million hours of training. It supports 30 languages and 9 Chinese dialects, allows voice description via natural language, can clone voices from a 3-second recording, delivers 48kHz studio-quality audio, and is free for commercial use under the Apache-2.0 license.

@laowangbabababa: Shocked! Dr. Qi on Douyin sells a 500k digital human agent per day, and I built it in 2 minutes. Using the Pixelle-Video project, which already has 22k stars. It supports digital human lip-syncing, motion transfer, and image-to-video. Supports ComfyUI, input a topic, from script writing to adding...

X AI KOLs Timeline

Introducing the open-source project Pixelle-Video: a fully automated AI short video engine. Input a topic and it automatically generates a video with script, images, voiceover, and background music. Supports local and cloud models, modular design allows flexible replacement of each component model.