Qwen3-tts.cpp + Compose Desktop GUI

Reddit r/LocalLLaMA 06/29/26, 06:11 PM Tools

qwen3 tts ggml desktop-gui compose-multiplatform open-source windows linux

Summary

The developer improved qwen3-tts.cpp to run 5x realtime on RTX 5080 and created a cross-platform desktop GUI with Kotlin Compose Multiplatform, featuring voice cloning, streaming, and speaker embedding management.

I improved my qwen3-tts.cpp implementation to be about 5x realtime on my RTX 5080. It is GGML based, so it should compile and run anywhere - however I only tested it with CPU & CUDA under Windows & Linux: https://github.com/Danmoreng/qwen3-tts.cpp Additionally I made a Desktop GUI with Kotlin Compose Multiplatform, working under Windows & Linux as well: https://github.com/Danmoreng/qwen-tts-studio Windows releases exist which you can download and run directly. Linux must be built from source. Qwen-TTS-Studio Features: fastest GGML implementation I know of, 15x faster than Python reference 0.6B & 1.7B models base model with voice cloning customvoice model with instructions voicedesign with instructions save speaker embeddings mix & merge speaker embeddings streaming (including semi-accurate text-highlighting) included download options for pre-converted GGUF models from huggingface (https://huggingface.co/Serveurperso/Qwen3-TTS-GGUF)

Original Article

Similar Articles

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

Reddit r/LocalLLaMA

Developer shows how to run Qwen3 TTS locally in real-time with streaming, quantization, word-level alignment, and custom voice fine-tuning for an expressive open-source TTS pipeline.

Qwen3-TTS Technical Report

Papers with Code Trending

The Qwen3-TTS technical report introduces a series of advanced multilingual text-to-speech models with voice cloning and controllable generation, featuring a dual-track LM architecture and specialized tokenizers for low-latency streaming.

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

Reddit r/LocalLLaMA

A user describes how Qwen3.6 35B, combined with the 'pi' tool, has transformed their computer workflows, allowing natural language control of the OS and automated task execution. They successfully built a landing page from voice messages entirely locally, demonstrating the model's practical utility.

audio.cpp: 12 audio models (Qwen3-TTS, PocketTTS, VeVo2 etc) in 1 C++/ggml runtime — TTS up to 5x faster than Python on CUDA

Reddit r/LocalLLaMA

audio.cpp is a C++/ggml runtime that integrates 12 audio models including Qwen3-TTS, PocketTTS, and VeVo2, achieving TTS up to 5x faster than Python on CUDA.

Qwen3.7-Plus: Multimodal Agent Intelligence (36 minute read)

TLDR AI

Qwen3.7-Plus is a multimodal agent model that unifies vision and language for seamless GUI and CLI interactions, now available via Alibaba Cloud Model Studio.

Similar Articles

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

Qwen3-TTS Technical Report

Qwen3.6 35Ba3 has changed my workflows and even how I use my computer

audio.cpp: 12 audio models (Qwen3-TTS, PocketTTS, VeVo2 etc) in 1 C++/ggml runtime — TTS up to 5x faster than Python on CUDA

Qwen3.7-Plus: Multimodal Agent Intelligence (36 minute read)

Submit Feedback