hybrid-mamba-attention

#hybrid-mamba-attention

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Hugging Face Daily Papers ↗ · 4d ago Cached

Nemotron 3 Ultra is a 550B parameter hybrid Mamba-Attention mixture-of-experts language model, pre-trained on 20T tokens, extended to 1M context, and post-trained with SFT, RL, and MOPD. It achieves up to 6x higher inference throughput than state-of-the-art LLMs with comparable accuracy, and is open-sourced.

0 favorites 0 likes

hybrid-mamba-attention

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Submit Feedback