@rasbt: And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio! Design-wise, it …
Summary
Nemotron 3 Ultra is an open-weight release with an impressive capability-to-efficiency ratio, using a Mamba-2-attention hybrid stack and LatentMoE, and is larger than the previous Super variant.
View Cached Full Text
Cached at: 06/05/26, 07:09 AM
And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio!
Design-wise, it carries forward the Mamba-2-attention hybrid stack and LatentMoE introduced in the previous Super variant. But everything is a bit bigger. https://t.co/nRjbMtY2aI
Sebastian Raschka (@rasbt): It’s been a while! 4 nice additions to the open-weight local-LLM-on-consumer-hardware ecosystem:
Similar Articles
NVIDIA Nemotron 3 Ultra is out.
NVIDIA has released Nemotron 3 Ultra, a new model designed to power faster and more efficient reasoning for long-running AI agents.
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Nemotron 3 Ultra is a 550B parameter hybrid Mamba-Attention mixture-of-experts language model, pre-trained on 20T tokens, extended to 1M context, and post-trained with SFT, RL, and MOPD. It achieves up to 6x higher inference throughput than state-of-the-art LLMs with comparable accuracy, and is open-sourced.
@ctnzr: We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and …
NVIDIA announces Nemotron 3 Super (120B) and Nemotron 3 Ultra (~500B) models, pretrained on 25T tokens using NVFP4 precision, emphasizing accelerated computing and efficiency improvements.
@TheAhmadOsman: I now rank Nemotron 3 Ultra among the top 5 Opensource models out there Frontier intelligence at home
The author ranks Nemotron 3 Ultra among the top five open-source AI models, describing it as bringing frontier intelligence to consumers.
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
NVIDIA releases Nemotron-3-Ultra, a 550B-parameter open-weight model with a hybrid architecture combining Mamba-2, MoE, and attention, supporting up to 1M token context and configurable reasoning mode.