New and improved content moderation tooling

OpenAI Blog Products

Summary

OpenAI has launched an improved Moderation API endpoint that uses GPT-based classifiers to detect sexual, hateful, violent, or self-harm content, offering free access to developers. They also released a technical paper and evaluation dataset alongside the tool.

We are introducing a new and improved content moderation tool. The Moderation endpoint improves upon our previous content filter, and is available for free today to OpenAI API developers.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:55 PM

# New and improved content moderation tooling Source: [https://openai.com/index/new-and-improved-content-moderation-tooling/](https://openai.com/index/new-and-improved-content-moderation-tooling/) To help developers protect their applications against possible misuse, we are introducing the faster and more accurate[Moderation endpoint⁠\(opens in a new window\)](https://beta.openai.com/docs/api-reference/moderations)\. This endpoint provides OpenAI API developers with free access to[GPT‑based⁠](https://openai.com/index/customized-gpt-3/)classifiers that detect undesired content—an instance of[using AI systems⁠](https://openai.com/index/critiques/)to assist with human supervision of these systems\. We have also released both a[technical paper⁠\(opens in a new window\)](https://arxiv.org/abs/2208.03274)describing our methodology and the[dataset⁠\(opens in a new window\)](https://github.com/openai/moderation-api-release)used for evaluation\. When given a text input, the Moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self\-harm—content prohibited by our[content policy⁠\(opens in a new window\)](https://beta.openai.com/docs/usage-guidelines/content-policy)\. The endpoint has been trained to be quick, accurate, and to perform robustly across a range of applications\. Importantly, this reduces the chances of products “saying” the wrong thing, even when deployed to users at\-scale\. As a consequence, AI can unlock benefits in sensitive settings, like education, where it could not otherwise be used with confidence\.

Similar Articles

Upgrading the Moderation API with our new multimodal moderation model

OpenAI Blog

OpenAI is launching `omni-moderation-latest`, a new multimodal moderation model built on GPT-4o that supports both text and image inputs, adds new harm categories, and significantly improves accuracy across 40 languages. The updated model is free to use via the Moderation API for all developers.

A Holistic Approach to Undesired Content Detection in the Real World

OpenAI Blog

OpenAI presents a comprehensive framework for building robust content moderation systems through careful taxonomy design, data quality control, active learning pipelines, and techniques to prevent overfitting. The approach detects multiple categories of undesired content including sexual content, hate speech, violence, and self-harm, achieving performance superior to existing off-the-shelf models.

Using GPT-4 for content moderation

OpenAI Blog

OpenAI describes using GPT-4 for content moderation by enabling policy experts to develop and refine content policies in hours rather than months through an iterative process of comparing GPT-4 judgments against human labels. The approach reduces manual moderation burden while keeping humans in the loop for complex cases and bias monitoring.

Helping developers build safer AI experiences for teens

OpenAI Blog

OpenAI releases prompt-based safety policies and the open-weight gpt-oss-safeguard model to help developers build age-appropriate AI experiences for teens, covering risks like graphic content, harmful behaviors, and dangerous activities.

OpenAI API

OpenAI Blog

OpenAI announces the release of an API for accessing its AI models with a general-purpose text interface, launching in private beta with strict safety measures including mandatory production reviews and content restrictions to prevent harmful use cases.