@DataChaz: @NVIDIA just quietly dropped an incredibly impressive speech recognition model that completely changes the math for loc…

X AI KOLs Timeline Models

Summary

NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.

@NVIDIA just quietly dropped an incredibly impressive speech recognition model that completely changes the math for local voice pipelines. Nemotron-3.5-ASR is a 0.6B parameter, open-source model built specifically for real-time streaming. What makes it so good: → 40+ supported languages → Cache-aware architecture (eliminates redundant audio computation) → Configurable latency (down to 80ms chunk sizes) → Emits beautifully punctuated, capitalized text automatically Because it’s so remarkably lightweight, it doesn't force you into a massive H100 dependency. It scales brilliantly on CPUs or widely available L40S GPUs. At its lowest latency setting, it handles ~17x more concurrent streams than previous 1.1B buffered models. HUGE win for devs building agent pipelines: you now have local, offline speech processing that is lighter, noticeably faster, and keeps data safely within your own security boundaries. 100% FREE and open-source. Repo and weights in ↓
Original Article
View Cached Full Text

Cached at: 06/23/26, 02:09 PM

@NVIDIA just quietly dropped an incredibly impressive speech recognition model that completely changes the math for local voice pipelines.

Nemotron-3.5-ASR is a 0.6B parameter, open-source model built specifically for real-time streaming.

What makes it so good: → 40+ supported languages → Cache-aware architecture (eliminates redundant audio computation) → Configurable latency (down to 80ms chunk sizes) → Emits beautifully punctuated, capitalized text automatically

Because it’s so remarkably lightweight, it doesn’t force you into a massive H100 dependency.

It scales brilliantly on CPUs or widely available L40S GPUs.

At its lowest latency setting, it handles ~17x more concurrent streams than previous 1.1B buffered models.

HUGE win for devs building agent pipelines:

you now have local, offline speech processing that is lighter, noticeably faster, and keeps data safely within your own security boundaries.

100% FREE and open-source.

Repo and weights in ↓

Similar Articles

nvidia/nemotron-3.5-asr-streaming-0.6b

Hugging Face Models Trending

NVIDIA releases Nemotron 3.5 ASR, a 600M parameter multilingual streaming speech recognition model supporting 40 language-locales with a Cache-Aware FastConformer-RNNT architecture for low-latency transcription. The model supports configurable chunk sizes and is ready for commercial use under the OpenMDW-1.1 license.

@kwindla: https://x.com/kwindla/status/2062544580105359686

X AI KOLs Timeline

NVIDIA released Nemotron 3.5 ASR, an open-source multilingual speech-to-text model with the lowest latency tested, available in multilingual and English-only variants, ideal for voice agents and self-hosted deployments.