Tag
NVIDIA introduces Nemotron-Labs Diffusion, a family of diffusion language models that generate text in parallel and iteratively refine it, offering faster generation and the ability to revise previous tokens.
NVIDIA releases Nemotron 3.5 ASR, a 600M parameter multilingual streaming speech recognition model supporting 40 language-locales with a Cache-Aware FastConformer-RNNT architecture for low-latency transcription. The model supports configurable chunk sizes and is ready for commercial use under the OpenMDW-1.1 license.
NVIDIA announces Nemotron 3 Super (120B) and Nemotron 3 Ultra (~500B) models, pretrained on 25T tokens using NVFP4 precision, emphasizing accelerated computing and efficiency improvements.
The user converted Nvidia's Llama-Embed-Nemotron-8B model to MLX format with fp16, 8-bit, 4-bit, and 2-bit quantizations, enabling in-process embedding loading on Apple Silicon via mlx-embeddings.
NVIDIA has open-sourced the video understanding model Nemotron 3 Nano Omni, which uses 3D convolution technology and processes video 10 times faster than playback speed. It excels at audio-video analysis, surveillance retrieval, and asset tagging, but is not suitable for code or text inference tasks.
OpenClaw, an open-source persistent AI assistant, has become the most-starred GitHub project, sparking debate over security and autonomy. NVIDIA is collaborating to enhance security and releasing NemoClaw as a secure reference implementation.
NVIDIA announces Nemotron 3 Nano Omni, an open multimodal model that unifies vision, audio, and language processing to enable faster and more efficient AI agents, achieving up to 9x higher throughput compared to other open omni models.
NVIDIA releases Nemotron 3 Nano Omni, a 30B parameter multimodal model capable of processing video, audio, image, and text with integrated reasoning capabilities for enterprise workflows.
NVIDIA introduces Nemotron OCR v2, a fast multilingual OCR model built using synthetic data generation. The model achieves 34.7 pages/second on a single A100 GPU by using a unified FOTS-based architecture with feature reuse across detection, recognition, and relational components.