seshat-tts: A local real-time narrator for games that supports voice cloning
Summary
seshat-tts is an open-source tool that enables real-time game narration with voice cloning, using OCR or an LLM for text extraction and local synthesis with pocket-tts. Voice cloning takes ~10 seconds on an RTX 2070 Super and runs on CPU after caching.
Similar Articles
Tested out VoxCPM2 (Open-Source TTS) locally. The "Ultimate Cloning" mode capturing breathing/accents is getting insane.
Technical breakdown and benchmarks of VoxCPM2, an open-source TTS model featuring Ultimate Cloning Mode for capturing breathing and accents, tested locally with low VRAM footprint and cross-lingual accent retention.
@akshay_pachaar: this TTS model generates speech 167x faster than you can hear it. Supertonic is an on-device TTS engine that runs via O…
Supertonic is a new open-source TTS engine that runs on-device via ONNX, supporting 31 languages and outperforming ElevenLabs in speed, even on a Raspberry Pi without a GPU.
@tom_doerr: Zero-shot voice cloning for 30 languages https://github.com/sunnyxrxrx/X-Voice…
X-Voice is a flow-matching-based multilingual text-to-speech system that enables zero-shot voice cloning across 30 languages, with open-source code, model, and demo available.
this new Moss tts 1.5 is damn good with voice cloning
MOSS TTS 1.5 is a new text-to-speech model with voice cloning capabilities, offered via a Hugging Face Space, and is considered better than Fish Audio S2 Pro due to open licensing.
Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried
Developer shows how to run Qwen3 TTS locally in real-time with streaming, quantization, word-level alignment, and custom voice fine-tuning for an expressive open-source TTS pipeline.