content-moderation

#content-moderation

Our commitment to community safety

OpenAI Blog ↗ · 2026-04-28 Cached

OpenAI outlines its commitment to community safety, detailing how ChatGPT is trained to detect and mitigate risks of violence and harm through refined safeguards and expert input.

0 favorites 0 likes

#content-moderation

IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from UCLA examine how automated content moderation tools, including Perspective API, fail to distinguish between reclaimed and hateful uses of slurs for LGBTQIA+, Black, and women communities. The study finds low inter-annotator agreement even among in-group members and poor alignment between community judgments and AI moderation tools, highlighting the need for context-sensitive approaches.

0 favorites 0 likes

#content-moderation

Deezer says 44% of songs uploaded to its platform daily are AI-generated

Hacker News Top ↗ · 2026-04-20 Cached

Deezer reports that 44% of all new music uploaded to its platform is AI-generated, amounting to nearly 75,000 tracks per day, though consumption remains low at 1-3% of total streams with 85% flagged as fraudulent. The surge highlights growing challenges for streaming platforms in managing AI content and protecting artists' rights.

0 favorites 0 likes

#content-moderation

Can LLMs Understand the Impact of Trauma? Costs and Benefits of LLMs Coding the Interviews of Firearm Violence Survivors

arXiv cs.CL ↗ · 2026-04-20 Cached

This study evaluates the use of open-source LLMs for inductive coding of interviews with Black firearm violence survivors, finding that while LLMs can identify some codes, overall relevance remains low and guardrails cause significant narrative erasure. The research highlights both potential and ethical limitations of applying AI to qualitative research involving vulnerable populations.

0 favorites 0 likes

#content-moderation

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

Reddit r/MachineLearning ↗ · 2026-04-16

Researcher built an open-source political compass benchmark with 98 structured questions across 14 policy areas to evaluate frontier LLMs (GPT-5.3, Claude Opus 4.6, KIMI K2). Key finding: refusal patterns and opt-out options significantly shift model positioning, with GPT-5.3 refusing 100% of questions when given an opt-out, while KIMI K2 exhibits topic-specific censorship on Taiwan/Xinjiang despite progressive positions elsewhere.

0 favorites 0 likes

#content-moderation

Introducing the Child Safety Blueprint

OpenAI Blog ↗ · 2026-04-08 Cached

OpenAI introduces a Child Safety Blueprint, a policy framework developed with NCMEC, state attorneys general, and Thorn to combat AI-enabled child sexual exploitation through modernized laws, improved provider reporting, and built-in safety measures. The initiative brings together legal, operational, and technical approaches to prevent and detect child safety harms at scale.

0 favorites 0 likes

#content-moderation

Helping developers build safer AI experiences for teens

OpenAI Blog ↗ · 2026-03-24 Cached

OpenAI releases prompt-based safety policies and the open-weight gpt-oss-safeguard model to help developers build age-appropriate AI experiences for teens, covering risks like graphic content, harmful behaviors, and dangerous activities.

0 favorites 0 likes

#content-moderation

Creating with Sora Safely

OpenAI Blog ↗ · 2026-03-23 Cached

OpenAI announces comprehensive safety measures for Sora 2 and the Sora app, including provenance signals, C2PA metadata embedding, consent-based likeness controls through characters, and enhanced protections for teen users. The approach combines technical safeguards like content filtering with policy-based guardrails to prevent misuse of AI-generated video.

0 favorites 0 likes

#content-moderation

OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first

OpenAI Blog ↗ · 2026-03-17 Cached

OpenAI Japan announced the Japan Teen Safety Blueprint, a framework introducing age-aware protections, stronger safety policies for users under 18, expanded parental controls, and well-being-centered design features to ensure teens use generative AI safely.

0 favorites 0 likes

#content-moderation

The Sora feed philosophy

OpenAI Blog ↗ · 2026-02-03 Cached

OpenAI outlines the design philosophy behind Sora's feed system, prioritizing creativity, user control, connection, and safety through steerable ranking algorithms and robust content moderation guidelines.

0 favorites 0 likes

#content-moderation

Our approach to age prediction

OpenAI Blog ↗ · 2026-01-20 Cached

OpenAI is rolling out an age prediction model on ChatGPT to identify accounts likely belonging to users under 18 and apply appropriate safeguards. The system uses behavioral and account-level signals to estimate age and restricts access to sensitive content for minors, with options for age verification and parental controls.

0 favorites 0 likes

#content-moderation

Updating our Model Spec with teen protections

OpenAI Blog ↗ · 2025-12-18 Cached

OpenAI has updated its Model Spec with new Under-18 Principles to guide ChatGPT's behavior for teen users aged 13-17, focusing on safety, age-appropriate interactions, and stronger guardrails around high-risk topics like self-harm and explicit content. The update was developed with input from the American Psychological Association and is grounded in developmental science.

0 favorites 0 likes

#content-moderation

Introducing gpt-oss-safeguard

OpenAI Blog ↗ · 2025-10-29 Cached

OpenAI releases gpt-oss-safeguard, open-weight reasoning models for safety classification tasks available in 120B and 20B sizes under Apache 2.0 license. The models use chain-of-thought reasoning to classify content according to developer-provided policies at inference time, enabling flexible and explainable content moderation.

0 favorites 0 likes

#content-moderation

gpt-oss-safeguard technical report

OpenAI Blog ↗ · 2025-10-29 Cached

OpenAI releases gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, open-weight reasoning models designed for policy-based content classification with full chain-of-thought reasoning. The technical report provides baseline safety evaluations and demonstrates the models' capabilities for content labeling tasks under the Apache 2.0 license.

0 favorites 0 likes

#content-moderation

Strengthening ChatGPT’s responses in sensitive conversations

OpenAI Blog ↗ · 2025-10-27 Cached

OpenAI has updated ChatGPT's default model to better handle sensitive mental health conversations, including improved recognition of distress, de-escalation, and routing to crisis resources. The update expands safety testing to include emotional reliance and non-suicidal mental health emergencies as standard baseline metrics.

0 favorites 0 likes

#content-moderation

Launching Sora responsibly

OpenAI Blog ↗ · 2025-09-30 Cached

OpenAI announces safety-focused launch of Sora 2 and the Sora app with built-in protections including watermarking, C2PA metadata, consent-based likeness controls, teen safeguards, and multi-layered content filtering for harmful material and audio.

0 favorites 0 likes

#content-moderation

Combating online child sexual exploitation & abuse

OpenAI Blog ↗ · 2025-09-29 Cached

OpenAI announces comprehensive policies and technical measures to prevent the use of its models for child sexual exploitation and abuse, including pre-deployment protections, user monitoring, developer oversight, and partnerships with organizations like NCMEC and Thorn.

0 favorites 0 likes

#content-moderation

Building towards age prediction

OpenAI Blog ↗ · 2025-09-16 Cached

OpenAI is building an age prediction system for ChatGPT to tailor experiences for users under 18, with automatic content restrictions and parental control features launching by month-end. The system will default to the safer under-18 experience when age is uncertain, and includes new features like blackout hours and distress notifications for parents.

0 favorites 0 likes

#content-moderation

Teen safety, freedom, and privacy

OpenAI Blog ↗ · 2025-09-16 Cached

OpenAI outlines its approach to balancing teen safety, user freedom, and privacy in ChatGPT, including building an age-prediction system, parental controls, and stricter content rules for under-18 users. The company also signals plans for advanced privacy features and advocates for AI conversation privilege with policymakers.

0 favorites 0 likes

#content-moderation

SafetyKit scales risk agents with OpenAI’s most capable models

OpenAI Blog ↗ · 2025-09-09 Cached

SafetyKit launches AI agents powered by OpenAI's GPT-5, GPT-4.1, and specialized techniques to detect fraud and prohibited activity across text, images, and financial transactions with 95%+ accuracy. The solution enables marketplaces and fintech platforms to automate risk detection, policy enforcement, and content moderation at scale.

0 favorites 0 likes

content-moderation

Submit Feedback