@llm_wizard: btw, we publish everything you need to build our Nemotron models including the recipes and pipelines directly. https://…

X AI KOLs Following Models

Summary

NVIDIA released the Nemotron repository with open training recipes, pipelines, and model weights for their Nemotron models, including the new Nemotron 3 Ultra and Nemotron 3 Nano Omni, supporting agentic AI and multimodal capabilities.

btw, we publish everything you need to build our Nemotron models including the recipes and pipelines directly. https://github.com/NVIDIA-NeMo/Nemotron/tree/main…
Original Article
View Cached Full Text

Cached at: 06/10/26, 07:45 AM

btw, we publish everything you need to build our Nemotron models including the recipes and pipelines directly. https://github.com/NVIDIA-NeMo/Nemotron/tree/main…


NVIDIA-NeMo/Nemotron

Source: https://github.com/NVIDIA-NeMo/Nemotron

NVIDIA Nemotron Developer Repository

Open and efficient models for agentic AI. Training recipes, deployment guides, and use-case examples for the Nemotron family.

Python 3.10+ License: Apache 2.0 Contributions Welcome Docs

Watch the Nemotron Overview

Watch: Nemotron Overview


🎉Nemotron 3 Ultra was announced at GTC San Jose 2026. The model is open-source on Hugging Face, and the training recipe is now available in this repo. To learn more, see the usage guide!

🎉Nemotron 3 Nano Omni is now released — a 30B-A3B hybrid Mamba-Transformer MoE with native text, image, video, and audio support, designed as a multimodal perception sub-agent for agentic AI. See the release blog, the training recipe, and the model weights.


Why Nemotron?

Open ModelsFully transparent training data, techniques, and weights for community innovation
Compute EfficiencyModel pruning and optimization enabling higher throughput via TensorRT-LLM
High AccuracyBuilt on frontier open models with human-aligned reasoning for agentic workflows
Flexible DeploymentDeploy anywhere: edge, single GPU, or data center with NIM microservices

Use from Claude Code

This repo ships a Claude Code plugin called nemotron-customize that turns the step catalog under src/nemotron/steps/ into a guided, repo-native pipeline builder.

Install once:

/plugin marketplace add NVIDIA/Nemotron
/plugin install nemotron-customize@nvidia-nemotron

Then, start Claude Code from the repo root and invoke the skill:

cd /path/to/Nemotron        # repo root: must contain pyproject.toml and src/nemotron/steps/
claude
/nemotron-customize

The skill resolves all file paths against your current working directory, so it must be invoked from the Nemotron checkout root. Running it from a subdirectory will cause file reads to fail.

The skill plans the step DAG, validates artifact wiring, and emits the YAML configs needed to run the requested pipeline. See skills/nemotron-customize/SKILL.md for the full contract.

The marketplace installs only nemotron-customize. The other folders under skills/ (model knowledge bases, contributor add-* skills) stay on disk for repo browsing but are not loaded as plugins.


Repository Overview

nemotron/
│
├── src/nemotron/steps/      Modular building blocks for training, eval, SDG, and more
│
├── src/nemotron/recipes/    Training recipes (complete, reproducible pipelines)
│
├── usage-cookbook/          Usage cookbooks (deployment and model usage guides)
│
└── use-case-examples/       Examples of leveraging Nemotron in agentic workflows

Which section should I use?

Nemotron StepsTraining RecipesUsage CookbooksUse Case Examples
PurposeFull lifecycle building blocks, chain data prep, training, eval and other stepsReproduce full training pipelines from raw data to modelDeploy and use trained modelsBuild end-to-end applications
FormatThe nemotron steps CLI and YAML configsPython packages with configs, scripts, and evaluationJupyter notebooks with step-by-step guidesJupyter notebooks and scripts
When to useYou want to run one stage in isolation or compose a custom pipelineYou want to train, fine-tune, or understand how a model was builtYou have a model and want to deploy or run inferenceYou want to build an application (RAG, agents, tool use)
Locationsrc/nemotron/steps/src/nemotron/recipes/usage-cookbook/use-case-examples/

What is Nemotron?

NVIDIA Nemotron is a family of open, high-efficiency multimodal models purpose-built for agentic AI.

Model Tiers:

  • Nano — Optimized for edge and PC deployments
  • Super — Single GPU deployment with highest throughput
  • Ultra — Multi-GPU datacenter applications

Nemotron models excel at coding, math, scientific reasoning, tool calling, instruction following, and visual reasoning. Deploy across edge, single GPU, or data center environments with support for NeMo, TensorRT-LLM, vLLM, SGLang, and NIM microservices.


Nemotron Steps

A Nemotron step is a named, reusable unit of work that you invoke with the nemotron steps CLI. Each step packages a description of the work it performs, the artifacts it consumes and produces, and one or more named configurations that supply parameter values. Steps live under src/nemotron/steps/, and the CLI discovers them at startup.

The training recipes in the next section are composed from these steps. Run a step on its own when you want one stage, or chain steps together when you need a different pipeline shape than the published recipes.

Step Categories

The catalog covers the full training lifecycle.

  • Data curation and preparation with curate/* and data_prep/*.
  • Synthetic data generation (SDG) with sdg/*.
  • Corpus translation with translate/*.
  • Bring-your-own benchmark generation with byob/*.
  • Pretraining, supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), and reinforcement learning (RL) with pretrain/*, sft/*, peft/*, and rl/*.
  • Checkpoint conversion and model optimization with convert/* and optimize/*.
  • Benchmark evaluation with eval/*.
  • Execution-profile setup with env/*.

Documentation


Training Recipes

The Nemotron respository provides reproducible training pipelines from raw data to deployment-ready models. These implementations reflect how large language models are actually trained: careful experimentation, validation gates, and systematic optimization.

Why Complete Pipelines?

Training a production model involves interconnected components. Isolated examples miss how stages interact. Complete pipelines show:

  • How data quality affects downstream performance across pretraining, SFT, and RL
  • Which training techniques actually work together, not just in theory
  • Where validation gates prevent failures and maintain reproducibility
  • How to balance competing objectives across stages

Because these are complete systems, you can extract specific techniques with confidence. Each component has been proven to work in context.

Each Recipe Includes

  • 🎨 Synthetic Data Generation - Scripts to generate synthetic datasets using NVIDIA-NeMo/DataDesigner
  • 🗂️ Data Curation - Scripts to prepare training data using NVIDIA NeMo Curator for scalable data processing, filtering, and quality enhancement
  • 🔁 Training - Complete training loops with hyperparameters using:
  • 📊 Evaluation - Benchmark evaluation on standard suites using NVIDIA NeMo Evaluator
  • 📖 Documentation - Detailed explanations of each stage

Available Recipes

ModelDescriptionStagesGuide
Nemotron 3 Ultra550B total / 55B active hybrid Mamba-Attention LatentMoE Transformer with MTP and 1M context — NVIDIA’s largest Nemotron 3 model for datacenter-scale agentic reasoningPretrain → SFT → RLVR → MOPDTraining Guide
Nemotron 3 Super120.6B total / 12.7B active Hybrid Mamba Latent MoE Transformer for frontier reasoning, coding, and agentic tasksPretrain → SFT → RLTraining Guide
Nemotron 3 Nano31.6B total / 3.6B active MoE Hybrid Mamba-Transformer for agentic reasoningPretrain → SFT → RLTraining Guide
Nemotron 3 Nano Omni30B total / 3B active hybrid Mamba-Transformer MoE — native text, image, video, and audio for agentic multimodal perceptionSFT → RL (MPO / text / vision) → EvalTraining Guide

Nemotron 3 Ultra

A training recipe for NVIDIA’s largest Nemotron 3 model — a 550B-A55B hybrid Mamba-Attention Mixture-of-Experts Transformer with LatentMoE and multi-token prediction (MTP), pretrained in NVFP4 and extended to 1M-token context for datacenter-scale agentic reasoning.

Open-Source Data Only: These recipes train exclusively on the open-sourced subset of training data. Results will differ from the tech report benchmarks, which used additional proprietary data. Use these recipes as reference implementations to apply the methodology with your own data.

Model Specifications:

  • 550B total / 55B active parameters (MoE)
  • Hybrid Mamba-Attention architecture with LatentMoE + two shared-weight MTP layers
  • 20T pretraining tokens in NVFP4, two-phase data curriculum
  • Up to 1M (1,048,576) context length
  • Full program: Pretrain → SFT → RLVR → MOPD → MTP Boosting (this recipe covers Pretrain → SFT)

What You Can Extract:

  • Two-phase pretraining data mixture (tech-report Figure 4) over the open Nemotron datasets
  • Ray-based data prep: tokenize raw datasets → Megatron bin/idx (pretrain) and pack chat data → Parquet (SFT)
  • New open pretraining datasets: Specialized-v1.2 (Multiple-Choice / Generative / Fact-Seeking / Moral-Scenarios) and Legal-v1
  • Stage-local container builds (Day-0 Megatron-Bridge) for both pretrain and SFT
  • Megatron-Bridge training at Ultra scale (TP=2 / PP=12 / EP=32 pretrain, PP=6 SFT)

Resources:

Nemotron 3 Super

A complete training recipe for the frontier Hybrid Mamba Latent Mixture-of-Experts Transformer model with state-of-the-art reasoning, coding, and agentic capabilities.

Open-Source Data Only: These recipes train exclusively on the open-sourced subset of training data. Results will differ from the tech report benchmarks, which used additional proprietary data. Use these recipes as reference implementations to apply the methodology with your own data.

Model Specifications:

  • 120B total / 12B active parameters
  • Multi-stage RL pipeline: 3× RLVR + 2× SWE-RL + RLHF across 21 reward environments
  • Asynchronous GRPO with decoupled training and inference

What You Can Extract:

  • Large-scale pretraining with data curriculum
  • Multi-domain SFT pipeline
  • Multi-environment RLVR with 21 simultaneous reward environments
  • SWE-RL with container-isolated sandbox execution
  • GenRM-based RLHF with principle-following rewards
  • Asynchronous GRPO at 1K GPU scale

Resources:

Nemotron 3 Nano

A complete training recipe for the open, efficient Mixture-of-Experts hybrid Mamba-Transformer model optimized for agentic reasoning.

Open-Source Data Only: These recipes train exclusively on the open-sourced subset of training data. Results will differ from the tech report benchmarks, which used additional proprietary data. Use these recipes as reference implementations to apply the methodology with your own data.

Model Specifications:

  • 31.6B total parameters, 3.6B active per forward pass
  • 25 trillion pretraining tokens with curriculum learning
  • Up to 1M context length
  • 3.3x higher inference throughput than similarly sized models

What You Can Extract:

  • Curriculum-based pretraining with two-phase data mixture
  • Long-context extension via CPT methodology
  • Multi-domain SFT with 12+ data sources
  • InfinityByte cross-domain code synthesis
  • Tool-calling fine-tuning and budget-controlled reasoning
  • Multi-environment RLVR with GRPO
  • GenRM reward modeling with circular comparison
  • DPO for tool hallucination reduction

Resources:

Nemotron 3 Nano Omni

A multimodal training recipe for the 30B-A3B hybrid Mamba-Transformer Mixture-of-Experts model. Native support for text, image, video, and audio in a single decoder, designed as a perception sub-agent for agentic AI.

Nemotron 3 Nano Omni hybrid MoE architecture: each modality (audio via Parakeet, vision via C-RADIOv4-H + 3D convolution + Efficient Video Sampling, text via tokenizer) has its own encoder and adaptor; all streams converge on the unified 30B-A3B LLM decoder

Open-Source Data Only: These recipes train exclusively on the open-sourced subset of training data (e.g., CORD-v2 for SFT, public MMPR / MMPR-Tiny for RL). Results will differ from the release benchmarks, which used additional internal datasets. Use these recipes as reference implementations to apply the methodology with your own data.

Model Specifications:

  • 30B total / 3B active parameters (A3B MoE)
  • Hybrid architecture: Mamba layers (sequence/memory efficiency) + transformer layers (reasoning), with a unified text decoder
  • Native modalities: text, image, video, audio
  • Vision encoder: C-RADIOv4-H · Audio encoder: NVIDIA Parakeet · Video pipeline: 3D convolutions + Efficient Video Sampling (EVS)
  • Context length: progressively scaled 16K → 49K → 262K
  • Best-in-class on MMlongbench-Doc, OCRBenchV2; leading on WorldSense, DailyOmni, VoiceBench
  • Up to ~9.2× greater video-reasoning system capacity, ~7.4× on multi-document workloads vs. comparable open omni models
  • License: NVIDIA Nemotron Open Model License (enterprise-friendly, on-prem and any deployment)

What You Can Extract:

  • Multimodal SFT pipeline using Megatron-Bridge with the Valor32k recipe family (open-dataset CORD-v2 default + Valor32k variants)
  • Progressive context scaling: 16K → 49K → 262K
  • Multimodal preference optimization (MPO) on the public MMPR dataset
  • Text-only GRPO continuation of alignment via NeMo-RL
  • Vision GRPO on MMPR-Tiny
  • Inline NVIDIA stack: Megatron-Bridge for SFT, NeMo-RL (nano-v3-omni branch with the omni vllm fork as a submodule) for RL
  • Cookbook-style end-to-end recipe (build → data prep → SFT → RL → eval) reproducing the release training stages

Resources:


Usage Cookbooks

Practical deployment and model usage guides for Nemotron models.

ModelBest ForKey FeaturesResources
Nemotron 3 Ultra 550B A55BLong-running coding, research, and enterprise agentic workflows1M context, 550B/55B MoE, MTP, multi-GPU deployment, agent harness configsCookbooks
Nemotron 3 Super 120B A12BProduction deployments needing strong reasoning1M context, in NVFP4 single B200, RAG & tool callingCookbooks
Nemotron 3 Nano 30B A3BResource-constrained environments1M context, sparse MoE hybrid Mamba-2, controllable reasoningCookbooks
NVIDIA-Nemotron-Nano-12B-v2-VLDocument intelligence and video understanding12B VLM, video reasoning, Efficient Video SamplingCookbooks
Llama-3.1-Nemotron-Safety-Guard-8B-v3Multilingual content moderation9 languages, 23 safety categoriesCookbooks
Nemotron-ParseDocument parsing for RAG and AI agentsTable extraction, semantic segmentationCookbooks

Use Case Examples

End-to-end examples demonstrating practical applications in the use-case-examples/ directory:

  • Agentic Workflows — Multi-step AI agents with planning, context management, and external tools
  • RAG Systems — Pipelines combining retrieval with Nemotron models for grounded outputs
  • Tool Integration — Structured tool calling, function execution, and data enrichment
  • Production Patterns — Scalability, monitoring, and deployment architectures

Nemotron Open Datasets

More than just weights, recipes, and libraries: Nemotron is committed to opening data across many domains, training phases, and use cases.

Nemotron Data Catalogue

A comprehensive collection of NVIDIA Nemotron datasets spanning pre-training, post-training, reinforcement learning, multimodal, safety, and domain-specific applications. These openly available datasets power the Nemotron family of models for agentic AI development.


Code

Datasets for training code generation, competitive programming, and software engineering capabilities across multiple programming languages.

DatasetUsageLicenseModel(s)Description
Nemotron-CC-Code-v1Pre-trainingNVIDIA Data AgreementNemotron 3 Nano427.9B tokens from Common Crawl code pages using Lynx + LLM pipeline
Nemotron-Pretraining-Code-v1Pre-trainingNVIDIA Data AgreementNemotron Nano 2GitHub-sourced code corpus for Nemotron Nano 2
Nemotron-Pretraining-Code-v2Pre-trainingNVIDIA Data AgreementNemotron 3 NanoUpdated GitHub code + synthetic QA with STEM reasoning
Nemotron-Cascade-RL-SWERL TrainingCC-BY-4.0Nemotron 3SWE code repair from SWE-Bench, SWE-Smith, R2E-Gym
Nemotron-Competitive-Programming-v1SFTCC-BY-4.0Nemotron 32M+ Python and 1M+ C++ samples across 34K competitive programming questions
OpenCodeReasoningSFTCC-BY-4.0OpenCode-Nemotron735K Python samples across 28K competitive programming questions
OpenCodeReasoning-2SFTCC-BY-4.0OpenCode-Nemotron2.5M samples (1.4M Python, 1.1M C++) with code completion and critique
Scoring-VerifiersEvaluationCC-BY-4.0Benchmark for test case generation and code reward models

Math

Mathematical reasoning datasets ranging from pre-training corpora to advanced problem-solving with chain-of-thought and tool-integrated reasoning. Includes the AIMO-2 competition winning dataset.

DatasetUsageLicenseModel(s)Description
Nemotron-CC-Math-v1Pre-trainingNVIDIA Data AgreementNemotron Nano 2, Nemotron 3 Nano133B-token math dataset from Common Crawl using Lynx + LLM pipeline
Nemotron-Math-Proofs-v1SFTCC-BY-4.0Nemotron 3 NanoMathematical proofs dataset for Nemotron 3 post-training
Nemotron-Math-v2SFTCC-BY-4.0Nemotron 3347K samples and 7M reasoning trajectories for Deeper Math Reasoning
Nemotron-CrossThinkRL TrainingCC-BY-4.0Nemotron 3Multi-domain QA with MCQ and open-ended formats for verifiable rewards
OpenMathReasoningSFTCC-BY-4.0OpenMath-Nemotron5.68M samples, 306K problems from AoPS with CoT/TIR (AIMO-2 winner)

Science / STEM

Scientific reasoning datasets covering chemistry, physics, and general STEM domains for training models on scientific question answering and reasoning.

DatasetUsageLicenseModel(s)Description
Nemotron-Science-v1SFTCC-BY-4.0Nemotron 3 NanoSynthetic science reasoning (MCQA + chemistry RQA)

General / Web

Large-scale web-crawled and curated datasets for pre-training and post-training, including multilingual data and general instruction-following capabilities.

DatasetUsageLicenseModel(s)Description
Nemotron-CC-v2.1Pre-trainingNVIDIA Data AgreementNemotron 3 Nano2.5T tokens English web data with synthetic rephrases and translations
Nemotron-CC-v2Pre-trainingNVIDIA Data AgreementNemotron Nano 26.6T tokens quality-filtered Common Crawl with multilingual Q&A
Nemotron-Pretraining-Dataset-samplePre-training (Sample)NVIDIA Data AgreementSample subset of Nemotron pre-training corpus for experimentation
Llama-Nemotron-Post-Training-DatasetSFT + RLCC-BY-4.0Llama-Nemotron Ultra/Super/NanoMath, code, reasoning data (2.2M math, 500K code)
Nemotron-Post-Training-Dataset-v1SFTCC-BY-4.0Llama-3.3-Nemotron-Super-49B-v1.5Math, code, STEM, tool calling
Nemotron-Post-Training-Dataset-v2SFT + RLCC-BY-4.0Llama-NemotronMultilingual extension (Spanish, French, German, Italian, Japanese)
Nemotron-3-Nano-RL-Training-BlendRL TrainingCC-BY-4.0Nemotron-3-Nano-30B-A3BCurated multi-domain blend for Nemotron 3 Nano
Nemotron-RL-knowledge-web_search-mcqaRL TrainingODC-BY-1.0Nemotron 3Web search and multiple-choice QA tasks for NeMo Gym

Chat / Instruction Following

Datasets for training conversational AI with strong instruction-following capabilities, structured output generation, and multi-turn dialogue.

DatasetUsageLicenseModel(s)Description
Nemotron-Instruction-Following-Chat-v1SFTCC-BY-4.0Nemotron 3 NanoMulti-turn chat and structured output generation
Nemotron-RL-instruction_followingRL TrainingODC-BY-1.0Nemotron 3Verifiable instruction adherence from WildChat-1M + Open-Instruct
Nemotron-RL-instruction_following-structured_outputsRL TrainingODC-BY-1.0Nemotron 3JSON schema-constrained output formatting tests
Nemotron-Cascade-RL-Instruction-FollowingRL TrainingODC-BY-1.0Nemotron 3108K samples for instruction-following RL

Agentic / Tool Use

Datasets for training AI agents with tool calling, multi-step workflows, and agentic reasoning capabilities.

DatasetUsageLicenseModel(s)Description
Nemotron-Agentic-v1SFTCC-BY-4.0Nemotron 3 NanoMulti-turn trajectories for conversational tool use and agentic workflows
Nemotron-RL-agent-workplace_assistantRL TrainingODC-BY-1.0Nemotron 3Workplace assistant agent tasks for NeMo Gym

Alignment / Reward Modeling

Human preference and reward modeling datasets for RLHF, SteerLM training, and model alignment. Powers top-performing reward models on RM-Bench and JudgeBench.

DatasetUsageLicenseModel(s)Description
HelpSteer3Reward ModelingCC-BY-4.0Nemotron 3 Nano, Llama-Nemotron Super 49B40K+ samples; top on RM-Bench/JudgeBench with preference, feedback, edit-quality
HelpSteer2Reward ModelingCC-BY-4.0Nemotron-4-340B-Reward, Llama-3.1-Nemotron-70B-Reward21K samples with 5 attributes
HelpSteerSteerLM TrainingCC-BY-4.0Nemotron-4 SteerLM37K samples (helpfulness, correctness, coherence, complexity, verbosity)
Daring-AnteaterSFT/RLHFCC-BY-4.0Nemotron-4-340B-InstructInstruction tuning dataset; synthetic subsets + FinQA, wikitablequestions
sft_datablend_v1SFTCC-BY-4.0SFT data blend for RLHF pipeline

Vision-Language / Multimodal

High-quality VLM training data for document intelligence, OCR, image reasoning, video QA, and chain-of-thought visual understanding.

DatasetUsageLicenseModel(s)Description
Nemotron-VLM-Dataset-v2VLM TrainingCC-BY-4.0 (some CC-BY-SA-4.0)Nemotron VLM8M samples for OCR, image reasoning, video QA with chain-of-thought
Llama-Nemotron-VLM-Dataset-v1VLM TrainingCC-BY-4.0 (some CC-BY-SA-4.0)Llama-3.1-Nemotron-Nano-VL-8B3M samples for visual question answering and captioning

Physical AI / Robotics

Datasets for embodied reasoning, physical common sense, and robotic manipulation. Powers Cosmos-Reason1 for physical AI applications.

DatasetUsageLicenseModel(s)Description
Cosmos-Reason1-SFT-DatasetSFTCC-BY-4.0Cosmos-Reason1-7BVideo-text pairs for robotics, ego-centric demos, AV reasoning
Cosmos-Reason1-RL-DatasetRL TrainingCC-BY-4.0Cosmos-Reason1-7BRL data for physical common sense and embodied reasoning
Cosmos-Reason1-BenchmarkEvaluationCC-BY-4.0Benchmark for embodied reasoning (robotics, HoloAssist, AV)
PhysicalAI-Robotics-Manipulation-AugmentedTrainingCC-BY-4.01K Franka Panda demos with Cosmos Transfer1 domain augmentation

Autonomous Vehicles

Multi-sensor driving data and synthetic scenarios for training and validating autonomous vehicle systems.

DatasetUsageLicenseModel(s)Description
PhysicalAI-Autonomous-VehiclesTrainingNVIDIA AV Dataset License1,700 hours multi-sensor data from 25 countries, 306K clips
PhysicalAI-Autonomous-Vehicle-Cosmos-Drive-DreamsSDGCC-BY-4.0Cosmos81K synthetic videos with LiDAR and HD-map annotations
PhysicalAI-Autonomous-Vehicle-Cosmos-SyntheticSDGCC-BY-4.0CosmosCosmos-generated synthetic driving scenarios
PhysicalAI-Autonomous-Vehicles-NuRecReconstructionNVIDIA AV Dataset LicenseNuScenes-based reconstruction data

Synthetic Personas / Data Generation

Privacy-safe synthetic personas grounded in real-world demographics for sovereign AI development and synthetic data generation pipelines.

DatasetUsageLicenseModel(s)Description
Nemotron-Personas-USASDGCC-BY-4.0NeMo Data Designer1M US personas grounded in Census demographics
Nemotron-Personas-JapanSDGCC-BY-4.0NeMo Data Designer1M Japanese personas aligned with regional statistics
Nemotron-Personas-IndiaSDGCC-BY-4.0NeMo Data Designer3M Indian personas for sovereign AI development
Nemotron-PersonasSDGCC-BY-4.0NeMo Data Designer100K US personas with 22 fields aligned to Census data

Privacy / PII Detection

Synthetic datasets for training named entity recognition models to detect and redact personally identifiable information.

DatasetUsageLicenseModel(s)Description
Nemotron-PIINER TrainingCC-BY-4.0GLiNER-PII100K synthetic records with 55+ PII/PHI entity types

Safety / Content Moderation

Content safety datasets for training guardrail models covering comprehensive risk taxonomies. Powers NemoGuard content safety models.

DatasetUsageLicenseModel(s)Description
Aegis-AI-Content-Safety-Dataset-1.0Content ModerationCC-BY-4.0NemoGuard Permissive/Defensive11K annotated interactions covering 13 risk categories
Aegis-AI-Content-Safety-Dataset-2.0Content ModerationCC-BY-4.0Llama-3.1-NemoGuard-8B-ContentSafetyExtended safety dataset with 23 violation categories
Nemotron-Content-Safety-Audio-DatasetAudio SafetyCC-BY-4.01.9K audio files from Aegis 2.0 with accent diversity

RAG / Conversational QA

Training and evaluation data for retrieval-augmented generation and conversational question answering. Powers ChatQA models.

DatasetUsageLicenseModel(s)Description
ChatRAG-BenchEvaluationOther (derived)Benchmark across 10 datasets for document QA and unanswerable detection
ChatQA-Training-DataSFTOther (derived)ChatQA-1.5Training data for ChatQA models from multiple sources
ChatQA2-Long-SFT-dataSFTOther (derived)ChatQA-2128K long-context training data for ChatQA-2

Biology / Drug Discovery

Protein sequence data for training biological foundation models.

DatasetUsageLicenseModel(s)Description
esm2_uniref_pretraining_dataPre-trainingCC-BY-4.0ESM2-nv188M protein sequences for ESM2

3D / Spatial Intelligence

Testing and synthetic data for 3D reconstruction, video generation, and spatial understanding models.

DatasetUsageLicenseModel(s)Description
Lyra-Testing-ExampleEvaluationCC-BY-4.0LyraTesting examples for Lyra generative 3D reconstruction
PhysicalAI-SpatialIntelligence-Lyra-SDGSDGCC-BY-4.0LyraSynthetic data for spatial intelligence models
GEN3C-Testing-ExampleEvaluationCC-BY-4.0GEN3CTesting examples for GEN3C video generation
ChronoEdit-Example-DatasetEvaluationCC-BY-4.0ChronoEditTemporal reasoning examples for image editing

💡 Feature Requests & Ideas

Have an idea for improving Nemotron models? Create a Discussion topic for it!

If you have a feature request, feel free to open an Issue and tag it as enhancement.

Your feedback helps shape the future of Nemotron models!


Documentation


Contributing

We welcome contributions: examples, recipes, or other tools. Please read the Contributing Guidelines before submitting pull requests.


Security

To report any vulnerabilities, please reach out to [email protected]


License

Apache 2.0 License — see LICENSE for details.


NVIDIA Nemotron — Open and efficient models for agentic AI.

elie (@eliebakouch): mythos will be bad ON PURPOSE on ai “frontier llm research” tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

Similar Articles

NVIDIA Nemotron 3 Ultra is out.

Reddit r/LocalLLaMA

NVIDIA has released Nemotron 3 Ultra, a new model designed to power faster and more efficient reasoning for long-running AI agents.

Nemotron 3 Ultra by NVIDIA

Product Hunt

NVIDIA introduces Nemotron 3 Ultra, a new AI model designed to enable faster and more efficient reasoning for long-running agents.