speech-to-text

#speech-to-text

Is Whisper still the best default for speech-to-text if the app needs to be real time?

Reddit r/AI_Agents ↗ · yesterday

Explores whether OpenAI's Whisper remains the top choice for real-time speech-to-text applications, considering alternatives and performance trade-offs.

0 favorites 0 likes

#speech-to-text

A fully local voice assistant setup

Lobsters Hottest ↗ · 2d ago Cached

A guide to building a fully local voice assistant using Platypush on a Raspberry Pi, covering hotword detection, speech-to-text, text-to-speech, and home automation integration.

0 favorites 0 likes

#speech-to-text

@iluciddreaming: Google just killed another startup... Google AI Edge Eloquent now supports Mac, a fully local Wispr Flow alternative. Based on the latest Gemma model, supports real-time voice transcription + voice commands to edit text. Free, no subscription, no...

X AI KOLs Timeline ↗ · 3d ago Cached

Google AI Edge Eloquent now supports Mac as a fully local Wispr Flow alternative, offering real-time voice transcription and voice command text editing based on the latest Gemma model. Free, no subscription, and fully private locally.

0 favorites 0 likes

#speech-to-text

Mutter AI Dictation

Product Hunt ↗ · 6d ago

Mutter AI Dictation is a private AI dictation tool that operates offline.

0 favorites 0 likes

#speech-to-text

I Play Video Games with Spinal Muscular Atrophy

Hacker News Top ↗ · 6d ago Cached

Andrei Cebotar, a gamer with Spinal Muscular Atrophy, shares the assistive tools he uses daily to play games and communicate, including PlayAbility for facial gesture control, Handy for local speech-to-text, and the Xbox Adaptive Controller.

0 favorites 0 likes

#speech-to-text

Montreal Forced Aligner and the state of speech-to-text alignment in 2026

arXiv cs.CL ↗ · 2026-06-18 Cached

This paper documents the Montreal Forced Aligner 3.0, a widely used open-source tool for forced alignment, achieving state-of-the-art performance across English, Japanese, and Korean with mean boundary errors below 15 ms.

0 favorites 0 likes

#speech-to-text

@svpino: There's no way call centers stay in business after this. Listen to this conversation. You cannot tell I'm speaking to a…

X AI KOLs Following ↗ · 2026-06-15 Cached

Cartesia released Sonic-3.5 (text-to-speech) and Ink-2 (speech-to-text), claiming they are the #1 streaming models for voice agents, with potential to disrupt call centers.

0 favorites 0 likes

#speech-to-text

@Smartpigai: Every time someone asks me 'What tools do you use for content / video / material management?', I can't be bothered to explain again. Here's a one-time comprehensive list — save it yourself: 1. Video editing (make videos with code) https://github.com/remotion-dev/remotion… 2. Speech-to-text / meeting minutes…

X AI KOLs Timeline ↗ · 2026-06-07 Cached

A post compiling multiple open-source tools for content creation, including video editing, speech-to-text, AI drawing, media processing, etc., emphasizing free and open-source and the ability to build your own system.

0 favorites 0 likes

#speech-to-text

@kwindla: https://x.com/kwindla/status/2062544580105359686

X AI KOLs Timeline ↗ · 2026-06-04 Cached

NVIDIA released Nemotron 3.5 ASR, an open-source multilingual speech-to-text model with the lowest latency tested, available in multilingual and English-only variants, ideal for voice agents and self-hosted deployments.

0 favorites 0 likes

#speech-to-text

@debugginglife25: Demo of Telugu Thodu built using @SarvamAI Speech-to-text system seamlessly translates Telugu to English, accurately ha…

X AI KOLs Following ↗ · 2026-06-02 Cached

A demo of Telugu Thodu, an app built using SarvamAI's speech-to-text system that translates Telugu to English with high accuracy, handling pauses and nuances.

0 favorites 0 likes

#speech-to-text

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python

Reddit r/LocalLLaMA ↗ · 2026-05-31

NVIDIA's Parakeet speech-to-text models have been ported to pure C++/ggml, achieving byte-identical output to NeMo, up to 5x faster inference on GPU, and quantized GGUF variants for efficient deployment anywhere without Python or PyTorch.

0 favorites 0 likes

#speech-to-text

My Accessibility Stack and the future on Wayland

Lobsters Hottest ↗ · 2026-05-31 Cached

A personal account of how the Linux desktop's upcoming Wayland-only future will break accessibility for users relying on input tools like Talon Voice, highlighting the lack of attention to input accessibility compared to output accessibility.

0 favorites 0 likes

#speech-to-text

@Honcia13: Highly recommend an open-source speech-to-subtitle tool! Incredible speed and top-notch quality! Supports multiple languages including Chinese, Japanese, Korean, English, etc., with specially optimized formatting rules for natural and professional subtitles. It's a desktop tool based on PySide6 + ElevenLabs API that can convert audio/video files or JSON…

X AI KOLs Timeline ↗ · 2026-05-30 Cached

Recommend Scribe2SRT, an open-source speech-to-subtitle tool based on PySide6 and ElevenLabs API, supporting multiple languages with optimized formatting for fast generation of high-quality SRT subtitles.

0 favorites 0 likes

#speech-to-text

Do You Actually Need to Pay for Transcription Software?

Wired ↗ · 2026-05-30 Cached

The article evaluates Wispr Flow, an AI-powered transcription tool, comparing it with free alternatives like open-source models (Whisper, Canary) and built-in features (Apple dictation, Google Voice Typing), concluding that paid subscriptions may not be necessary for many users.

0 favorites 0 likes

#speech-to-text

Parrot Speech-to-text API

Product Hunt ↗ · 2026-05-25

Parrot Speech-to-text API offers fast and accurate transcription for production-grade voice agents.

0 favorites 0 likes

#speech-to-text

I fine-tuned Cohere Transcribe to support diarization and timestamps

Reddit r/LocalLLaMA ↗ · 2026-05-22

Fine-tuned Cohere Transcribe, the best open-source speech-to-text model, to support diarization and timestamps. The new model is available on Hugging Face.

0 favorites 0 likes

#speech-to-text

How AI voice agents actually work

Reddit r/AI_Agents ↗ · 2026-05-22

A detailed explainer on the five-layer architecture of AI voice agents, including speech-to-text, LLM, text-to-speech, orchestrator, and telephony, all operating under a 500ms latency constraint to maintain natural conversation flow.

0 favorites 0 likes

#speech-to-text

@gkxspace: I spend two to three thousand on AI subscriptions every month, some for TTS, ASR, etc. The mainstream ones are expensive and their API protocols differ. I kept thinking: is there a single plan that covers voice cloning, meeting transcription, AI podcast generation, real-time voice Q&A, voice input, and coding? Finally found a godsend—StepFun's S...

X AI KOLs Timeline ↗ · 2026-05-20 Cached

StepFun launches Step Plan subscription at $6.99/month, integrating LLM, TTS, ASR, image generation, and other AI models. Supports direct OpenAI SDK connection, applicable for voice cloning, meeting transcription, AI podcast generation, etc.

0 favorites 0 likes

#speech-to-text

TongueType for macOS

Product Hunt ↗ · 2026-05-19

TongueType is a local dictation app for macOS that does not require a subscription.

0 favorites 0 likes

#speech-to-text

Streaming Speech-to-Text Translation with a SpeechLLM

arXiv cs.CL ↗ · 2026-05-15 Cached

Presents a SpeechLLM architecture for streaming speech-to-text translation that adaptively decides when to output tokens based on audio, achieving 1-2 second latency with quality close to non-streaming baselines.

0 favorites 0 likes

speech-to-text

Submit Feedback