@rasbt: And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio! Design-wise, it …

X AI KOLs Timeline 06/04/26, 04:41 PM Models

open-weight nemotron mamba moe llm local-llm

Summary

Nemotron 3 Ultra is an open-weight release with an impressive capability-to-efficiency ratio, using a Mamba-2-attention hybrid stack and LatentMoE, and is larger than the previous Super variant.

And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio! Design-wise, it carries forward the Mamba-2-attention hybrid stack and LatentMoE introduced in the previous Super variant. But everything is a bit bigger. https://t.co/nRjbMtY2aI

Original Article

View Cached Full Text

Cached at: 06/05/26, 07:09 AM

And another open-weight release. Nemotron 3 Ultra has an ultra impressive capability:efficiency ratio!

Design-wise, it carries forward the Mamba-2-attention hybrid stack and LatentMoE introduced in the previous Super variant. But everything is a bit bigger. https://t.co/nRjbMtY2aI

Sebastian Raschka (@rasbt): It’s been a while! 4 nice additions to the open-weight local-LLM-on-consumer-hardware ecosystem:

Similar Articles

NVIDIA Nemotron 3 Ultra is out.

Reddit r/LocalLLaMA

NVIDIA has released Nemotron 3 Ultra, a new model designed to power faster and more efficient reasoning for long-running AI agents.

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Hugging Face Daily Papers

Nemotron 3 Ultra is a 550B parameter hybrid Mamba-Attention mixture-of-experts language model, pre-trained on 20T tokens, extended to 1M context, and post-trained with SFT, RL, and MOPD. It achieves up to 6x higher inference throughput than state-of-the-art LLMs with comparable accuracy, and is open-sourced.

@ctnzr: We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and …

X AI KOLs Following

NVIDIA announces Nemotron 3 Super (120B) and Nemotron 3 Ultra (~500B) models, pretrained on 25T tokens using NVFP4 precision, emphasizing accelerated computing and efficiency improvements.

@TheAhmadOsman: I now rank Nemotron 3 Ultra among the top 5 Opensource models out there Frontier intelligence at home

X AI KOLs Following

The author ranks Nemotron 3 Ultra among the top five open-source AI models, describing it as bringing frontier intelligence to consumers.

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4

Hugging Face Models Trending

NVIDIA releases Nemotron-3-Ultra, a 550B-parameter open-weight model with a hybrid architecture combining Mamba-2, MoE, and attention, supporting up to 1M token context and configurable reasoning mode.