stt

#stt

STT That Can Challenge Dragon Professional on Windows

Reddit r/LocalLLaMA ↗ · 2d ago

A new speech-to-text tool claims to rival Dragon Professional on Windows, offering a competitive alternative for voice recognition.

0 favorites 0 likes

#stt

Streaming medical STT running locally on a MacBook

Reddit r/LocalLLaMA ↗ · 2d ago

Describes a medical speech-to-text system that runs locally on a MacBook, enabling streaming transcription without cloud dependency.

0 favorites 0 likes

#stt

Best STT API for voice agents? I’d test latency before accuracy

Reddit r/AI_Agents ↗ · 3d ago

The author argues that for live voice agents, STT latency and real-time behavior are more critical than raw transcription accuracy, and proposes a different evaluation scorecard.

0 favorites 0 likes

#stt

Your voice agent probably isn't slow because of the LLM.

Reddit r/AI_Agents ↗ · 2026-06-17

A developer debunks the common belief that LLM latency is the primary cause of slow voice agents, explaining that delays often stem from earlier stages like audio capture, VAD, and STT. They recommend logging specific latency metrics and testing various STT/TTS providers and orchestration frameworks to diagnose issues.

0 favorites 0 likes

#stt

A structured path for learning to build voice agents, from your first STT call to production

Reddit r/AI_Agents ↗ · 2026-06-17

A curated, open-source learning path for building voice agents, covering from STT to production, with 190+ resources and a 5-week plan.

0 favorites 0 likes

#stt

@badlogicgames: pibot is now running fully local, using parakeet for STT, qwen3-tts for TTS, and Qwen 3.6 as the local multi-modal LLM …

X AI KOLs Following ↗ · 2026-05-29 Cached

pibot is now fully local, using Parakeet for STT, Qwen3-tts for TTS, and Qwen 3.6 as the local multimodal LLM via llama.cpp, with Rust/mlx-c based inference engines, achieving zero Python dependencies.

0 favorites 0 likes

#stt

@berryxia: Guys, my back isn’t chilling. But, I’m thrilled after seeing this model architecture! While everyone is still frantically stacking parameters and competing with general-purpose large models, Interfaze has introduced a brand-new hybrid architecture. It achieves OCR, vision, STT, and structured output accuracy for deterministic tasks that crushes Gemini-3-Flash…

X AI KOLs Timeline ↗ · 2026-05-13 Cached

Interfaze introduces a new hybrid AI model architecture that combines DNN/CNN encoders with transformers to achieve superior accuracy and cost-efficiency for deterministic tasks such as OCR, vision, and STT, compared to generalist models.

0 favorites 0 likes

#stt

@aaron_epstein: New model just released that beats sonnet 4.6, gemini 3 flash, and gpt 5.4 mini on OCR, vision, and STT tasks @interfaz…

X AI KOLs Following ↗ · 2026-05-12

A new AI model from interfaze_ai claims to outperform leading models (sonnet 4.6, gemini 3 flash, gpt 5.4 mini) on OCR, vision, and speech-to-text tasks.

0 favorites 0 likes

stt

Submit Feedback