seshat-tts: A local real-time narrator for games that supports voice cloning

Reddit r/ArtificialInteligence Tools

Summary

seshat-tts is an open-source tool that enables real-time game narration with voice cloning, using OCR or an LLM for text extraction and local synthesis with pocket-tts. Voice cloning takes ~10 seconds on an RTX 2070 Super and runs on CPU after caching.

Hello everyone, This program allows you to plug in your own llm, or simply rely on OCR (text extracter) to perform real time audio synthesis using pocket-tts. Voice cloning is available through uvx when you link your huggingface account, and it takes about 10 seconds to clone a voice off a rtx 2070 super. After which it gets cached within the model as a safetensor, so it's nearly instant and runs off your CPU. You can easily expand this program to operate with games that use unity, using the voice cloning manager to instantiate NPCs with their own custom voices. You'll be able to use this in any game and adapt it to whatever workflow or tool you'll like through the licence. source code is available under the MIT licence. https://github.com/scriptriva/seshat-tts
Original Article

Similar Articles