seshat-tts: A local real-time narrator for games that supports voice cloning

Reddit r/ArtificialInteligence 05/29/26, 03:15 AM Tools

local-real-time-narrator voice-cloning game-modding tts open-source pocket-tts uvx

Summary

seshat-tts is an open-source tool that enables real-time game narration with voice cloning, using OCR or an LLM for text extraction and local synthesis with pocket-tts. Voice cloning takes ~10 seconds on an RTX 2070 Super and runs on CPU after caching.

Hello everyone, This program allows you to plug in your own llm, or simply rely on OCR (text extracter) to perform real time audio synthesis using pocket-tts. Voice cloning is available through uvx when you link your huggingface account, and it takes about 10 seconds to clone a voice off a rtx 2070 super. After which it gets cached within the model as a safetensor, so it's nearly instant and runs off your CPU. You can easily expand this program to operate with games that use unity, using the voice cloning manager to instantiate NPCs with their own custom voices. You'll be able to use this in any game and adapt it to whatever workflow or tool you'll like through the licence. source code is available under the MIT licence. https://github.com/scriptriva/seshat-tts

Original Article

Similar Articles

Tested out VoxCPM2 (Open-Source TTS) locally. The "Ultimate Cloning" mode capturing breathing/accents is getting insane.

Reddit r/ArtificialInteligence

Technical breakdown and benchmarks of VoxCPM2, an open-source TTS model featuring Ultimate Cloning Mode for capturing breathing and accents, tested locally with low VRAM footprint and cross-lingual accent retention.

@akshay_pachaar: this TTS model generates speech 167x faster than you can hear it. Supertonic is an on-device TTS engine that runs via O…

X AI KOLs Following

Supertonic is a new open-source TTS engine that runs on-device via ONNX, supporting 31 languages and outperforming ElevenLabs in speed, even on a Raspberry Pi without a GPU.

@tom_doerr: Zero-shot voice cloning for 30 languages https://github.com/sunnyxrxrx/X-Voice…

X AI KOLs Timeline

X-Voice is a flow-matching-based multilingual text-to-speech system that enables zero-shot voice cloning across 30 languages, with open-source code, model, and demo available.

this new Moss tts 1.5 is damn good with voice cloning

Reddit r/LocalLLaMA

MOSS TTS 1.5 is a new text-to-speech model with voice cloning capabilities, offered via a Hugging Face Space, and is considered better than Fish Audio S2 Pro due to open licensing.

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried