Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

Hugging Face Blog Models

Summary

NVIDIA releases Nemotron 3.5 Content Safety, a unified multimodal AI safety model that combines multilingual support, custom enterprise policy enforcement, and auditable reasoning (THINK mode) in a single inference call. It builds on the previous Nemotron 3 model by deepening multimodal integration to evaluate text prompts, images, and assistant responses together for more comprehensive safety verdicts.

No content available
Original Article
View Cached Full Text

Cached at: 06/05/26, 02:13 AM

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

Source: https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety Back to Articles

https://huggingface.co/login?next=%2Fblog%2Fnvidia%2Fnemotron-3-5-content-safety-

The last two years have seen NVIDIA’s content safety stack grow from a focused English text classifier into a family of specialized models—each extending coverage to new modalities, languages, and inference modes.Nemotron 3 Content Safety, released in March 2026, combined multimodal and multilingual capabilities for the first time in a single 4B-parameter model. Today, we are releasingNemotron 3.5 Content Safety, which completes that arc: a single model that unifies multimodal input, multilingual reach, custom enterprise policy enforcement, and auditable reasoning into one inference call.

This post covers what changes in 3.5, the design decisions behind each new capability, and how to integrate the model into production safety pipelines.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#whats-new-in-nemotron-35-content-safetyWhat’s New in Nemotron 3.5 Content Safety

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#1-unified-multimodal-evaluation1. Unified Multimodal Evaluation

Nemotron 3 introduced image understanding; Nemotron 3.5 deepens the multimodal integration. The model takes auser prompt, an optional image, and an optional assistant responseas a single context window and produces a coherent safety verdict over the combined input. Evaluating all three together—rather than scoring each independently—closes a well-known gap in multimodal safety scenarios: policy violations that only emerge from theinteractionbetween text and image, or between request and response, are now caught in a single pass.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#2-global-language-coverage2. Global Language Coverage

Nemotron 3.5 maintains the 12-language explicit training coverage of its predecessors—English, French, Spanish, German, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Portuguese, and Italian—while also inheriting strong zero-shot generalization across approximately 140 languages from the Gemma 3 base model. This means deployments in markets where training data is sparse (e.g., Southeast Asian languages, Scandinavian languages, less-resourced African languages) benefit from base-model multilingual transfer without requiring separate fine-tuning.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#3-custom-policy-enforcement3. Custom Policy Enforcement

This is the most significant architectural addition in 3.5 relative to Nemotron 3. Production deployments rarely operate under a single universal safety taxonomy. A healthcare platform has a different risk profile than a financial services chatbot, a developer tools IDE, or a children’s education app. Nemotron 3.5 accepts a custom policy specification alongside the input. The model reasons over that policy when producing its verdict rather than deferring entirely to the built-in taxonomy. This extends the work first introduced inNemotron Content Safety Reasoning 4Bto the full multimodal, multilingual setting.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#4-reasoning-traces-think-mode4. Reasoning Traces (THINK Mode)

Every safety verdict in Nemotron 3.5 can be accompanied by an auditable reasoning trace via an optionalthink mode. When enabled, the model outputs its step-by-step reasoning before delivering a finalsafe/unsafelabel and, optionally, the violated categories.

<think>
The user prompt asks for guidance on acquiring a controlled substance without a prescription.
The assistant response provides specific sourcing steps and references an online marketplace.
This interaction violates the Criminal Planning/Confessions and Controlled Substances categories.
The image (a pharmacy exterior) provides locational context but does not alter the verdict.
</think>

User Safety: unsafe
Response Safety: unsafe
Safety Categories: Criminal Planning/Confessions, Controlled Substances

When latency is the primary constraint, THINK mode can be disabled to return to the same low-latency binary verdict available in Nemotron 3.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#5-safety-dataset5. Safety Dataset

With Nemotron 3.5, we are releasing our safety dataset. This is an important milestone since most OSS safety models don’t generally provide the training or evaluation sets. This problem is worse for the multimodal space where artifacts such as images or videos are often derived from resources with restrictive licensing terms. The Nemotron 3.5 Content Safety Dataset is multimodal, multilingual, and includes safety reasoning traces that were used to train the model. These reasoning traces were generated in a 2-step manner to make them concise, similar to theNemotron Content Safety Reasoning 4Bmodel.


https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#model-architectureModel Architecture

Nemotron 3.5 Content Safety is built onGoogle Gemma 3 4B IT(4B parameters), providing a 128K context window, strong vision-language reasoning, and broad multilingual coverage. NVIDIA fine-tunes this base with a LoRA adapter that installs targeted safety classification behavior while keeping the model compact enough for real-time deployment on 8GB+ VRAM GPUs.

The inference interface supports three output modes:

Mode 1 — Low-latency binary verdict:

User Safety: safe
Response Safety: unsafe

Mode 2 — Binary verdict with categories:

User Safety: safe
Response Safety: unsafe
Safety Categories: Violence, Criminal Planning/Confessions

Mode 3 — THINK mode (reasoning + verdict):

<think>
[step-by-step reasoning trace]
</think>

User Safety: unsafe
Response Safety: unsafe
Safety Categories: [categories]

The safety taxonomy follows theAegis 2.0framework: 13 core categories aligned with the MLCommons safety taxonomy, plus 10 fine-grained subcategories. This alignment allows direct comparison with other open and closed guard systems benchmarked on Aegis-taxonomy datasets.


https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#reasoningReasoning

Reasoning is a supercharger for content safety classification because it provides the necessary context, customization, and accountability required for production AI systems, especially in enterprise and regulated environments.

Enables Custom and Contextual Policy Enforcement

Reasoning allows a content safety model to dynamically interpret and enforce custom, domain-specific policies defined in natural language at the time of inference. This is necessary because production deployments rarely operate under a single, universal safety taxonomy. A financial services chatbot has a different risk profile than a children’s education app which may have a lower tolerance for profanity. This capability supports:

  • **Category Suppression:**Disabling irrelevant categories, such as preventing a “violence” category trigger when a DevOps tool handles the phrase “terminate a process”.
  • **Custom Category Injection:**Defining proprietary risk categories specific to an organization’s regulatory or product policies.

Provides Auditable and Documented Justification

The reasoning traces show the model’s step-by-step logic before it delivers a final safe or unsafe verdict. This documented justification serves several purposes:

  • **Compliance and Audit Logging:**Regulated industries often require documented justifications for content moderation decisions.
  • **Human Review:**Reviewers can auditwhya verdict was reached to identify systematic model errors.
  • **Policy Iteration:**The traces reveal how the model interprets edge cases, allowing teams to iteratively refine and improve custom policy language.

Latency

While reasoning can introduce latency, the Nemotron model addresses this by condensing reasoning chains into concise summaries to limit output tokens and increase efficiency. This is done in a 2-step process similar to what was done in the predecessor modelNemotron-Content-Safety-Reasoning-4B. In the first step, we use larger, more powerful models such as Qwen 397B to generate chain-of-thought reasoning traces based upon provided prompts, images, and responses. We also provided the ground-truth labels of the samples to avoid any misclassification that can find its way into the reasoning traces. In step 2, we make these reasoning traces more concise by using another large model such as Qwen 80B. We specifically instruct this model to rephrase the original traces (from step 1) so that it fits in no more than 3 sentences. Based on our experiments, most reasoning traces generated are under 3 sentences.

The efficient reasoning traces optimization allows for low-latency custom policy enforcement. Furthermore, the reasoning traces provide a valuable training signal that can be used for training specialized moderator models. Developers can choose a dual-mode operation, disabling reasoning for minimal latency in generic tasks or enabling it for complex policies.


https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#training-dataTraining Data

The dataset driving Nemotron 3.5 is an evolution of the multimodal, multilingual blends used for Nemotron 3, with additions targeting the reasoning and custom-policy capabilities. We have used the following sources of data:

  • Multilingual text safety datafromNemotron Safety Guard Dataset v3, sampled from culturally nuanced subsets with proportional representation across safety categories and safe/unsafe splits.
  • Human-annotated multimodal datacollected in English by NVIDIA, translated into 12 languages. Critically,99% of training images are real photographs—not synthetic generations. This directly addresses a known weakness in the multimodal safety benchmark landscape, where existing datasets like VLGuard and MM-SafetyBench rely heavily on SDXL-generated images that lack the cultural texture and adversarial complexity of production content. While not all of these real images could be released due to licensing constraints, we are still able to release a subset of images from Wikimedia and synthetic generation.
  • Safe multimodal datafromNemotron VLM Dataset v2, covering scanned documents, charts, papers, and diagrams with associated queries—ensuring the model does not over-flag benign professional content.
  • Reasoning tracesderived from chain-of-thought outputs produced by larger teacher models—Qwen 397B and then shortened using Qwen 80B—are used to teach the model how to reason.
  • Topic following datafrom theCantTalkAboutThisdataset consisting of policy-specification/verdict pairs across a range of enterprise deployment scenarios (healthcare, finance, banking, education, etc.).
  • Synthetic dataaccounting for roughly 10% of total training volume, used primarily to diversify jailbreak patterns, generate rare policy violation examples, and produce multimodal adversarial cases.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#benchmarkingBenchmarking

Nemotron 3.5 Content Safety was evaluated across multilingual, multimodal, and custom-policy safety benchmarks, including VLGuard, MM-SafetyBench, PolyGuard, RTP-LX, Aya Redteaming, XSafety, MultiJail, Aegis, Dynaguardrail, and CoSA. These evaluations reflect the core production challenge for enterprise safety: applying consistent guardrails across global languages, text and image inputs, and domain-specific policies without adding significant latency.

Nemotron 3 set a strong baseline with 84% average accuracy on multimodal harmful-content tests and roughly half the latency of LlamaGuard-4-12B. Nemotron 3.5 maintains that compact 4B efficiency while adding custom policy support and reasoning traces.

Across multilingual and multimodal safety benchmarks, Nemotron 3.5 delivers strong harmful-content classification accuracy while maintaining a compact footprint. This matters because many safety models remain English-first, text-only, or too costly to run repeatedly in production pipelines. Nemotron 3.5 is designed to combine multilingual coverage, multimodal classification, custom-policy support, and low-latency deployment in one model.

figure1

Figure 1. Nemotron 3.5 Content Safety delivers strong harmful-content classification accuracy across multilingual and multimodal safety benchmarks, averaging about 85% across the evaluated benchmark set.

The language-level results highlight why multilingual safety matters for global enterprise AI. On Multilingual Aegis, Nemotron 3.5 averages 96.5% harmful-content classification accuracy across 12 languages. On RTP-LX, it averages 88.8%, for a combined Aegis and RTP-LX average of 92.7%. This consistency helps teams apply the same safety posture across customer, employee, and partner-facing workflows instead of relying on English-only moderation or separate regional safety models.

figure2Figure 2. Nemotron 3.5 Content Safety averages 97% harmful-content classification accuracy on Multilingual Aegis Cultural + Adapted (prompt classification) (harmful-f1) across 12 languages.

figure3Figure 3. Nemotron 3.5 Content Safety averages 89% harmful-content classification accuracy on RTPLX (prompt classification) (harmful-f1) across 12 languages.

Accuracy alone is not enough for production guardrails. Safety models must also be efficient enough to run before content is processed, returned, or routed downstream. Nemotron 3.5 Content Safety’s compact 4B design helps reduce the cost and latency of repeated safety checks, making multilingual and multimodal guardrails practical for real-world AI applications.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#latencyLatency

The latency profile is unchanged from Nemotron 3 in the default (no THINK) mode. THINK mode adds inference time proportional to trace length, but this overhead is predictable and can be budgeted separately from the synchronous moderation loop—for instance, by running THINK-mode evaluation asynchronously as part of an audit pipeline while the default mode handles real-time decisions.

figure4Figure 4. Nemotron 3.5 Content Safety achieves 3x lower end-to-end latency on a multimodal benchmark compared to an alternative multimodal safety model.

Compared to another reasoning safety model, our model generated up to 50% fewer tokens when reasoning is enabled, making it efficient in terms of cost and latency.


https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#addressing-the-benchmark-gapAddressing the Benchmark Gap

A recurring theme in multimodal safety research is the gaps in existing evaluation infrastructure. Nemotron 3.5’s development encountered the same gaps documented in the broader literature:

  • Text-only coverage: The most widely cited safety benchmarks (WildGuard, XSTest, HarmBench) are text-only. Multimodal performance cannot be inferred from text-benchmark results.
  • Synthetic image quality: Most multimodal benchmarks that exist use AI-generated images (typically SDXL) rather than real photographs, understating the difficulty of real production content.
  • Real-image licensing: Stock photo licenses prohibit redistribution in AI datasets, creating a structural gap between research and production conditions.

NVIDIA’s multimodal training data—with real images and culturally nuanced multilingual prompts—is designed to fill some of these gaps for model training. The benchmark gap for evaluation remains an open problem for the broader safety research community.

https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety#getting-startedGetting Started

Nemotron 3.5 Content Safety is available onHugging Faceunder the NVIDIA Open Model License for research and commercial use, along with the trainingdataset. It supports transformers, vLLM, and SGLang, and is available as a production-gradeNVIDIA NIMon build.nvidia.com for teams that need a pre-packaged, GPU-optimized inference microservice.

Developers can also access the model through inference platforms includingBaseten,Eigen AI,DeepInfra,OpenRouter, andVultr.

For custom policy workflows, NVIDIA provides a Claude- and Codex-compatibleskill for generating custom policies, along withcookbooks showing how to use the model. Custom policies and reasoning traces help teams adapt safety behavior to domain-specific rules while keeping decisions auditable.

Similar Articles

Nemotron 3 Ultra by NVIDIA

Product Hunt

NVIDIA introduces Nemotron 3 Ultra, a new AI model designed to enable faster and more efficient reasoning for long-running agents.

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 · Hugging Face

Reddit r/LocalLLaMA

NVIDIA releases Nemotron-3-Ultra-550B-A55B, a 550B parameter (55B active) frontier LLM featuring a hybrid LatentMoE architecture combining Mamba-2, MoE, and Attention layers, with up to 1M token context length and configurable reasoning mode. It supports 11 languages and is optimized for complex agentic workflows, long-context analysis, and high-accuracy reasoning.

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

Hugging Face Models Trending

NVIDIA releases Nemotron 3 Nano Omni, a 30B parameter multimodal model capable of processing video, audio, image, and text with integrated reasoning capabilities for enterprise workflows.