@GitTrend0x: Holy cow, guys! Run voice cloning and cinematic video dubbing locally, supporting 646 languages, fully offline, no API key, no internet needed. ElevenLabs is crushed! https://github.com/debpalash/OmniVoice-Studio… This open-source marvel is insane...
Summary
OmniVoice Studio is an open-source desktop app that enables local voice cloning and cinematic video dubbing across 646 languages, fully offline with no API keys, positioning itself as a privacy-focused alternative to ElevenLabs.
View Cached Full Text
Cached at: 05/14/26, 02:29 AM
OmniVoice Studio
The open-source ElevenLabs alternative.
Real-time dictation, zero-shot voice cloning, and cinematic video dubbing — all on your desktop. Open-source, no API keys, fully local. 646 languages.
Quickstart · Features · Why OmniVoice Studio? · TTS Engines · Contributing · Discord
🎙️ Voice Cloning
3-second clip → mirror any voice. 646 languages, zero-shot.
🎨 Voice Design
Gender, age, accent, pitch, speed, emotion, dialect — dial it in.
🎬 Video Dubbing
YouTube URL or file → transcribe → translate → re-voice → MP4.
⌨️ Dictation Widget
⌘+⇧+Space from any app. Transcribes, auto-pastes, disappears.
🔊 Vocal Isolation
Demucs-powered. Splits speech from music, keeps the background.
👥 Speaker Diarization
Pyannote + WhisperX. Auto-identifies who said what.
📦 Batch Queue
Drop 50 videos, walk away. Progress bars per job.
🤖 MCP Server
Use OmniVoice from Claude, Cursor, or any MCP client.
🛡️ AI Watermark
AudioSeal (Meta). Invisible, survives compression.
🔐 100% Local
No keys, no cloud, no accounts. Your machine only.
⚡ GPU Auto-Detect
CUDA · MPS · ROCm · CPU. ≤8 GB? Auto-offloads.
🧩 Extensible
Subclass TTSBackend, add any engine in ~50 lines.
🖥️ Desktop App
🐳 Docker
⚡ From Source
Similar Articles
@GoJun315: Open-source TTS that runs locally and beats ElevenLabs. Supertonic, a speech synthesis model that runs entirely on-device, no internet required, zero API costs. - Only 99M parameters, 167x faster than real-time on M4 Pro, runs on Raspberry Pi - Supports 31 languages, covering…
Supertonic is a lightning-fast, on-device TTS model with 99M parameters, supporting 31 languages. It runs locally with no API costs, outperforms cloud TTS on accuracy for numbers, phone numbers, and technical terms, and can be installed via Python, Node.js, Rust, Go, and more.
@Honcia13: Open-source TTS is going crazy! New weapons for industrial park scams? Tsinghua OpenBMB just released VoxCPM2: 20 billion parameters + 2 million hours of multilingual data training, 48kHz studio-quality sound! The most intense part is—no Tokenizer needed at all, performing diffusion autoregression directly in continuous latent space, maximizing detail retention!
Tsinghua University's OpenBMB has released VoxCPM2, an open-source multilingual TTS model with 20 billion parameters. It supports continuous latent space diffusion autoregressive generation without a Tokenizer, offering 48kHz studio-quality audio and powerful voice cloning and design capabilities.
k2-fsa/OmniVoice
OmniVoice is a massively multilingual zero-shot text-to-speech model supporting over 600 languages, built on a diffusion language model architecture with fast inference and voice cloning capabilities.
@GitHub_Daily: MacParakeet is an open-source tool on GitHub designed specifically for Macs that performs purely local speech-to-text transcription with high accuracy. It supports dragging and dropping audio/video files or pasting YouTube links to quickly generate transcripts with timestamps and speaker labels. It can also simultaneously record system audio and microphone input...
MacParakeet is a new open-source Mac application that provides fast, fully local voice transcription using Apple's Neural Engine and NVIDIA's Parakeet model, ensuring privacy by keeping audio data on-device.
@taiyo_ai_gakuse: Dude, I seriously made something amazing lol I built a CLI myself that incorporates this newly released GPT-Realtime-2,…
A developer shares a custom CLI tool that leverages the newly released GPT-Realtime-2 API to enable real-time Japanese-to-English voice translation directly within video conferencing platforms.