Upgrading the Moderation API with our new multimodal moderation model

OpenAI Blog 09/26/24, 10:00 AM Models

content-moderation multimodal api openai gpt-4o safety developer-tools

Summary

OpenAI is launching `omni-moderation-latest`, a new multimodal moderation model built on GPT-4o that supports both text and image inputs, adds new harm categories, and significantly improves accuracy across 40 languages. The updated model is free to use via the Moderation API for all developers.

We’re introducing a new model built on GPT-4o that is more accurate at detecting harmful text and images, enabling developers to build more robust moderation systems.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 02:54 PM

# Upgrading the Moderation API with our new multimodal moderation model Source: [https://openai.com/index/upgrading-the-moderation-api-with-our-new-multimodal-moderation-model/](https://openai.com/index/upgrading-the-moderation-api-with-our-new-multimodal-moderation-model/) OpenAIWe’re introducing a new model built on GPT‑4o that is more accurate at detecting harmful text and images, enabling developers to build more robust moderation systems\. Today we are introducing a new moderation model,`omni\-moderation\-latest`, in the[Moderation API⁠\(opens in a new window\)](https://platform.openai.com/docs/guides/moderation)\. Based on[GPT‑4o⁠](https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/), the new model supports both text and image inputs and is more accurate than our previous model, especially in non\-English languages\. Like the previous version, this model uses OpenAI's GPT‑based classifiers to assess whether content should be flagged across categories such as hate, violence, and self\-harm, while also adding the ability to detect additional harm categories\. Additionally, it provides more granular control over moderation decisions by calibrating probability scores to reflect the likelihood of content matching the detected category\. The new moderation model is free to use for all developers through the Moderation API\. Since we first [launched⁠](https://openai.com/index/new-and-improved-content-moderation-tooling/)the Moderation API in 2022, the volume and variety of content that automated moderation systems need to handle has increased, especially as more AI apps have reached massive scale in production\. We hope today’s upgrades help more developers benefit from the latest research and investments in our safety systems\. Companies across various sectors—from social media platforms and productivity tools to generative AI platforms—are using the Moderation API to build safer products for their users\. For instance, Grammarly is using the Moderation API as part of the safety guardrails in its AI communications assistance to ensure its products outputs are safe and fair\. Similarly, ElevenLabs utilizes the Moderation API along with in\-house solutions to scan content generated by their audio AI products, preventing and flagging outputs that violate their policies\. The updated moderation model includes a number of major improvements: - **Multimodal harm classification across six categories:**the new model can evaluate the likelihood that an image, in isolation or in conjunction with text, contains harmful content\. This is supported today for the following categories: violence \(`violence`and`violence/graphic`\), self\-harm \(`self\-harm, self\-harm/intent`, and`self\-harm/instruction`\) and sexual \(`sexual`but not`sexual/minors`\)\. The remaining categories are currently text\-only and we are working to expand multimodal support to more categories in the future\. - **Two new text\-only harm categories:**the new model can detect harm in two additional categories compared to our previous models:`illicit`, which covers instructions or advice on how to commit wrongdoing—a phrase like “how to shoplift” for example, and`illicit/violent`, which covers the same for wrongdoing that also includes violence\. - **More accurate scores, especially for non\-English content**: in a test of 40 languages, compared to the previous model, this new model improved 42% on our internal multilingual eval, and improved in 98% of languages tested\. For low\-resource languages like Khmer or Swati, it improved 70%, and we saw the biggest improvements in Telugu \(6\.4x\), Bengali \(5\.6x\), and Marathi \(4\.6x\)\. While the previous model had limited support for non\-English languages, the performance of the new model in Spanish, German, Italian, Polish, Vietnamese, Portuguese, French, Chinese, Indonesian, and English all exceed even English performance from the previous model\. - **Calibrated scores:**the new model’s scores now more accurately represent the probability that a piece of content violates the relevant policies and will be significantly more consistent across future moderation models\. AI content moderation systems help enforce platform policies and ease the workload on human moderators, crucially sustaining the health of digital platforms\. That’s why, just like our[previous model⁠](https://openai.com/index/new-and-improved-content-moderation-tooling/), we’re making the new moderation model free to use for all developers through the Moderation API, with rate limits depending on usage tier\. To get started, see our[Moderation API guide⁠\(opens in a new window\)](https://platform.openai.com/docs/guides/moderation)\.

Upgrading the Moderation API with our new multimodal moderation model

Similar Articles

New and improved content moderation tooling

Using GPT-4 for content moderation

Introducing our latest image generation model in the API

Spring Update

Hello GPT-4o

Submit Feedback

Similar Articles

New and improved content moderation tooling

Using GPT-4 for content moderation

Introducing our latest image generation model in the API