voice-cloning

#voice-cloning

this new Moss tts 1.5 is damn good with voice cloning

Reddit r/LocalLLaMA ↗ · 2026-05-30

MOSS TTS 1.5 is a new text-to-speech model with voice cloning capabilities, offered via a Hugging Face Space, and is considered better than Fish Audio S2 Pro due to open licensing.

0 favorites 0 likes

#voice-cloning

seshat-tts: A local real-time narrator for games that supports voice cloning

Reddit r/ArtificialInteligence ↗ · 2026-05-29

seshat-tts is an open-source tool that enables real-time game narration with voice cloning, using OCR or an LLM for text extraction and local synthesis with pocket-tts. Voice cloning takes ~10 seconds on an RTX 2070 Super and runs on CPU after caching.

0 favorites 0 likes

#voice-cloning

@hisevenih: The AI voice community is blown away. This GitHub open-source black tech takes AI voice to an insane level, truly achieving: one sentence, one voice. Remember this project name: VoxCPM2. It has already gained 20K stars on GitHub. Most incredibly, it doesn't even need a reference audio…

X AI KOLs Timeline ↗ · 2026-05-28 Cached

GitHub open-source project VoxCPM2 achieves AI voice cloning without reference audio, generating target voice precisely with just one sentence, has gained 20K stars.

0 favorites 0 likes

#voice-cloning

Woman loses thousands to scammer using suspected AI to mimic daughter's voice

Reddit r/ArtificialInteligence ↗ · 2026-05-26 Cached

A woman lost $5,400 after scammers used AI voice cloning to mimic her daughter's voice in a fake kidnapping scheme, highlighting the growing threat of AI-powered scams.

0 favorites 0 likes

#voice-cloning

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

Reddit r/LocalLLaMA ↗ · 2026-05-26 Cached

MOSS-TTS v1.5 is an updated open-source text-to-speech model with improved multilingual synthesis (supporting 31 languages), more stable zero-shot voice cloning, and explicit inline pause control.

0 favorites 0 likes

#voice-cloning

@denziideng: Another AI voice cloning 'dimensional reduction attack'... The CosyVoice I shared before can clone in 3 seconds, which I thought was already scary enough. But today's tool is even more lethal — after casually recording 1 minute of my own voice for training, it directly replicates tone, mannerisms, emotions, breathing, and pauses. It's almost like the soul of the original person possessed it! C...

X AI KOLs Timeline ↗ · 2026-05-26 Cached

GPT-SoVITS is an open-source AI voice cloning tool that supports zero-shot (5-second voice) and few-shot (1-minute training) high-fidelity voice cloning, cross-lingual inference, and comes with a complete WebUI toolchain. It has garnered 57.8k stars on GitHub, becoming the leading open-source project in the voice cloning field.

0 favorites 0 likes

#voice-cloning

@tom_doerr: Zero-shot voice cloning for 30 languages https://github.com/sunnyxrxrx/X-Voice…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

X-Voice is a flow-matching-based multilingual text-to-speech system that enables zero-shot voice cloning across 30 languages, with open-source code, model, and demo available.

0 favorites 0 likes

#voice-cloning

@Fluyeporlaweb: ElevenLabs costs $700 a year. HeyGen another $700. Someone just posted the local dubbing study that eliminates both sub…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

OmniVoice Studio is a free, open-source tool that locally dubs MP4 videos into 600 languages using Whisper for transcription, voice cloning from 3 seconds of audio, and Demucs for background separation, eliminating the need for paid subscriptions like ElevenLabs and HeyGen.

0 favorites 0 likes

#voice-cloning

@noahduck283: A tool that can download any YouTube video, cleanly remove vocals, transcribe, translate into 100+ languages, clone the original voice, and perform fully automatic dubbing. It takes less than 2 minutes. 100% runs locally. Free. Sews six top open-source models into a web page for "one-click download, vocal removal, transcription, translation, dubbing"...

X AI KOLs Timeline ↗ · 2026-05-22 Cached

Voice-Pro is a web tool that integrates six top open-source models (Whisper, Demucs, CosyVoice, F5-TTS, etc.), supporting YouTube video downloading, vocal removal, transcription, translation, voice cloning, and fully automatic dubbing. It takes less than 2 minutes, runs 100% locally, and is free.

1 favorites 0 likes

#voice-cloning

@lxfater: NetEase Youdao open-sourced ZiYue 4 model, within 27B parameters, SOTA in math and science. But what really interests me is its voice feature!! Cloning a voice is nothing new, ElevenLabs could do it long ago. But they all share a common flaw: cross-language accent. Take your Chinese voice and use it to speak Japanese — it has a Chinese accent, you can tell it's a foreigner struggling...

X AI KOLs Timeline ↗ · 2026-05-22 Cached

NetEase Youdao open-sourced the ZiYue 4 model with 27B parameters, achieving SOTA in math and science; its voice feature supports 3-second cross-language voice cloning across 14 languages with no accent issue, along with open-sourcing the all-scenario intelligent agent 'Longxia' (Lobster).

0 favorites 0 likes

#voice-cloning

@gkxspace: I spend two to three thousand on AI subscriptions every month, some for TTS, ASR, etc. The mainstream ones are expensive and their API protocols differ. I kept thinking: is there a single plan that covers voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and coding? Finally found a godsend—StepFun's S...

X AI KOLs Timeline ↗ · 2026-05-20 Cached

StepFun launches Step Plan subscription at $6.99/month, integrating LLM, TTS, ASR, image generation, and other AI models. Supports direct OpenAI SDK connection, applicable for voice cloning, meeting transcription, AI podcast generation, etc.

0 favorites 0 likes

#voice-cloning

21 GPU's benchmarked running a small TTS model (vram peak: 5GB)

Reddit r/LocalLLaMA ↗ · 2026-05-18

A user benchmarks 21 consumer GPUs on vast.ai running a small TTS model (OmniVoice) with peak VRAM of 5GB, comparing performance relative to real-time and to an RTX 3090.

0 favorites 0 likes

#voice-cloning

OpenAI Quietly Bought Voice-Cloning Startup Weights.gg, Then Folded the Team (3 minute read)

TLDR AI ↗ · 2026-05-18 Cached

OpenAI quietly acquired voice-cloning startup Weights.gg and absorbed its six-person team, likely to remove the public catalog of unauthorized celebrity voices while keeping its own Voice Engine restricted on safety grounds.

0 favorites 0 likes

#voice-cloning

@HowToAI_: ElevenLabs just lost its moat Someone has open-sourced a single app that replaces ElevenLabs AND WisprFlow and runs 100…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

An open-source app called Voicebox replaces ElevenLabs and WisprFlow with local voice cloning, multiple TTS engines, and MCP server support, running on various hardware with MIT license.

0 favorites 0 likes

#voice-cloning

DramaBox: An Open-Weight TTS Model Built Around Stage Directions

Reddit r/ArtificialInteligence ↗ · 2026-05-14 Cached

DramaBox is an open-weight TTS model fine-tuned from LTX-2.3 that uses stage directions as prompts to generate expressive speech, with optional voice cloning from a 10-second sample.

0 favorites 0 likes

#voice-cloning

Scenema Audio: Zero-shot expressive voice cloning and speech generation [N]

Reddit r/MachineLearning ↗ · 2026-05-13

Scenema AI releases Scenema Audio, an open-source diffusion-based model for zero-shot expressive voice cloning and speech generation, separating emotional performance from voice identity to allow any voice to perform any emotion.

0 favorites 0 likes

#voice-cloning

@GitTrend0x: Holy cow, guys! Run voice cloning and cinematic video dubbing locally, supporting 646 languages, fully offline, no API key, no internet needed. ElevenLabs is crushed! https://github.com/debpalash/OmniVoice-Studio… This open-source marvel is insane...

X AI KOLs Timeline ↗ · 2026-05-13 Cached

OmniVoice Studio is an open-source desktop app that enables local voice cloning and cinematic video dubbing across 646 languages, fully offline with no API keys, positioning itself as a privacy-focused alternative to ElevenLabs.

0 favorites 0 likes

#voice-cloning

Aratako/Irodori-TTS-500M-v3

Hugging Face Models Trending ↗ · 2026-05-12 Cached

Irodori-TTS-500M-v3 is a Japanese TTS model based on Rectified Flow Diffusion Transformer, supporting zero-shot voice cloning and unique emoji-based style/sound effect control.

0 favorites 0 likes

#voice-cloning

@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!

X AI KOLs Timeline ↗ · 2026-05-12 Cached

Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.

0 favorites 0 likes

#voice-cloning

@Prince_Canuma: mlx-audio v0.4.3 is here A massive release across models, server, and DX → 6 new TTS models: Higgs Audio v2 (voice clon…

X AI KOLs Timeline ↗ · 2026-05-09 Cached

mlx-audio v0.4.3 releases with 6 new TTS models including Higgs Audio v2 and OmniVoice (646+ languages), plus server improvements like concurrent requests and continuous batching, ~3x faster Voxtral Realtime on 4-bit, and slimmer dependencies for Apple Silicon.

1 favorites 1 likes

voice-cloning

Submit Feedback