I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

Reddit r/LocalLLaMA 06/11/26, 02:47 AM Tools

offline voice cpu-only open-source silero-vad parakeet-stt supertonic-tts ollama lm-studio

Summary

A fully offline, CPU-only voice loop for local LLMs using Silero VAD, Parakeet STT, and Supertonic TTS, integrated via a one-command installer. Works with Ollama, LM Studio, and various agent frameworks.

I kept wanting to *talk* to my local models instead of typing, but every voice setup wanted a GPU, shipped my audio to the cloud, or was macOS-only. So I built one that's none of those — and I benchmarked it, so these are real measured numbers, not vibes. **One command installs the whole stack and wires it into your agent. Then you just talk.** Everything runs on CPU and stays off your GPU (your GPU is busy running the actual LLM): - **Silero VAD** — knows when you start/stop talking, no push-to-talk. ~0.09 ms/frame. - **Parakeet TDT 0.6B v3** — local ONNX INT8 STT, 25 languages, OpenAI-compatible on :5093. A 2.5 s clip transcribes in ~280 ms (~9× realtime). - **Supertonic TTS 3** — local ONNX FP16 synthesis, multilingual, voices F1–F5 / M1–M5. A short reply renders in ~1.7 s (1.6–2.8× realtime), and a TTS→STT round-trip comes back word-for-word. **Measured on a plain i7-12700KF, CPU only, no GPU touched** — both my 3090s were full serving the LLM itself in vLLM, which is exactly the point: voice runs on CPU, VRAM stays with your model. **Works with whatever agent you use — one install drops a `talk` skill into all of them:** Claude Code, Hermes Agent, OpenClaw, OpenCode, and Codex. The same installer also auto-installs and starts the STT + TTS backends for you. **Data flow — nothing leaves the box:** you -> Silero VAD (CPU) -> Parakeet STT (CPU) -> your LLM (Ollama / LM Studio / vLLM) -> Supertonic 3 (CPU) -> speakers **Install (macOS / Linux):** git clone https://github.com/groxaxo/opencode-voice-service cd opencode-voice-service && ./setup.sh **Windows (PowerShell):** .\setup.ps1 The installer is interactive (pick components + agent integrations) and auto-starts via systemd / launchd / Task Scheduler. Free and MIT-licensed. **GitHub:** https://github.com/groxaxo/opencode-voice-service Runs fine on a 4-year-old ThinkPad with no GPU. Happy to answer VAD-tuning or ONNX-performance questions.

Original Article

I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

Similar Articles

Tested out VoxCPM2 (Open-Source TTS) locally. The "Ultimate Cloning" mode capturing breathing/accents is getting insane.

Your voice agent probably isn't slow because of the LLM.

@badlogicgames: pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM …

Built a Tauri v2 desktop chat shell for local LLMs — point it at Ollama / llama.cpp / any OpenAI-compatible endpoint, MIT, ~12 MB binary

@songjunkr: Sharing my local LLM setup for personal use: Equipment: MacStudio M2 Ultra 64gb Model on load - SuperQwen3.6 35b mlx 4b…

Submit Feedback

Similar Articles

Tested out VoxCPM2 (Open-Source TTS) locally. The "Ultimate Cloning" mode capturing breathing/accents is getting insane.

Your voice agent probably isn't slow because of the LLM.

@badlogicgames: pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM …

Built a Tauri v2 desktop chat shell for local LLMs — point it at Ollama / llama.cpp / any OpenAI-compatible endpoint, MIT, ~12 MB binary

@songjunkr: Sharing my local LLM setup for personal use: Equipment: MacStudio M2 Ultra 64gb Model on load - SuperQwen3.6 35b mlx 4b…