@DataChaz: @NVIDIA just quietly dropped an incredibly impressive speech recognition model that completely changes the math for loc…
Summary
NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.
View Cached Full Text
Cached at: 06/23/26, 02:09 PM
@NVIDIA just quietly dropped an incredibly impressive speech recognition model that completely changes the math for local voice pipelines.
Nemotron-3.5-ASR is a 0.6B parameter, open-source model built specifically for real-time streaming.
What makes it so good: → 40+ supported languages → Cache-aware architecture (eliminates redundant audio computation) → Configurable latency (down to 80ms chunk sizes) → Emits beautifully punctuated, capitalized text automatically
Because it’s so remarkably lightweight, it doesn’t force you into a massive H100 dependency.
It scales brilliantly on CPUs or widely available L40S GPUs.
At its lowest latency setting, it handles ~17x more concurrent streams than previous 1.1B buffered models.
HUGE win for devs building agent pipelines:
you now have local, offline speech processing that is lighter, noticeably faster, and keeps data safely within your own security boundaries.
100% FREE and open-source.
Repo and weights in ↓
Similar Articles
nvidia/nemotron-3.5-asr-streaming-0.6b
NVIDIA releases Nemotron 3.5 ASR, a 600M parameter multilingual streaming speech recognition model supporting 40 language-locales with a Cache-Aware FastConformer-RNNT architecture for low-latency transcription. The model supports configurable chunk sizes and is ready for commercial use under the OpenMDW-1.1 license.
@kwindla: https://x.com/kwindla/status/2062544580105359686
NVIDIA released Nemotron 3.5 ASR, an open-source multilingual speech-to-text model with the lowest latency tested, available in multilingual and English-only variants, ideal for voice agents and self-hosted deployments.
NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents
NVIDIA announces Nemotron 3 Nano Omni, an open multimodal model that unifies vision, audio, and language processing to enable faster and more efficient AI agents, achieving up to 9x higher throughput compared to other open omni models.
@DataChaz: @NVIDIA just dropped LocateAnything, making object detection ~10x faster by fixing one core bottleneck: How the model w…
NVIDIA released LocateAnything, an open-source model that achieves ~10x faster object detection by predicting all coordinates simultaneously instead of sequentially, reaching 12.7 FPS on a single H100 and outperforming 32B parameter models.
Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents
NVIDIA releases Nemotron 3 Nano Omni, a new long-context multimodal AI model capable of processing documents, audio, video, and text with high accuracy and efficiency.