Qwen3-tts.cpp + Compose Desktop GUI

Reddit r/LocalLLaMA Tools

Summary

The developer improved qwen3-tts.cpp to run 5x realtime on RTX 5080 and created a cross-platform desktop GUI with Kotlin Compose Multiplatform, featuring voice cloning, streaming, and speaker embedding management.

I improved my qwen3-tts.cpp implementation to be about 5x realtime on my RTX 5080. It is GGML based, so it should compile and run anywhere - however I only tested it with CPU & CUDA under Windows & Linux: https://github.com/Danmoreng/qwen3-tts.cpp Additionally I made a Desktop GUI with Kotlin Compose Multiplatform, working under Windows & Linux as well: https://github.com/Danmoreng/qwen-tts-studio Windows releases exist which you can download and run directly. Linux must be built from source. Qwen-TTS-Studio Features: fastest GGML implementation I know of, 15x faster than Python reference 0.6B & 1.7B models base model with voice cloning customvoice model with instructions voicedesign with instructions save speaker embeddings mix & merge speaker embeddings streaming (including semi-accurate text-highlighting) included download options for pre-converted GGUF models from huggingface (https://huggingface.co/Serveurperso/Qwen3-TTS-GGUF)
Original Article

Similar Articles

Qwen3-TTS Technical Report

Papers with Code Trending

The Qwen3-TTS technical report introduces a series of advanced multilingual text-to-speech models with voice cloning and controllable generation, featuring a dual-track LM architecture and specialized tokenizers for low-latency streaming.

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

Reddit r/LocalLLaMA

A user describes how Qwen3.6 35B, combined with the 'pi' tool, has transformed their computer workflows, allowing natural language control of the OS and automated task execution. They successfully built a landing page from voice messages entirely locally, demonstrating the model's practical utility.