Tag
OpenAI outlines its commitment to community safety, detailing how ChatGPT is trained to detect and mitigate risks of violence and harm through refined safeguards and expert input.
Researchers from UCLA examine how automated content moderation tools, including Perspective API, fail to distinguish between reclaimed and hateful uses of slurs for LGBTQIA+, Black, and women communities. The study finds low inter-annotator agreement even among in-group members and poor alignment between community judgments and AI moderation tools, highlighting the need for context-sensitive approaches.
Deezer reports that 44% of all new music uploaded to its platform is AI-generated, amounting to nearly 75,000 tracks per day, though consumption remains low at 1-3% of total streams with 85% flagged as fraudulent. The surge highlights growing challenges for streaming platforms in managing AI content and protecting artists' rights.
This study evaluates the use of open-source LLMs for inductive coding of interviews with Black firearm violence survivors, finding that while LLMs can identify some codes, overall relevance remains low and guardrails cause significant narrative erasure. The research highlights both potential and ethical limitations of applying AI to qualitative research involving vulnerable populations.
Researcher built an open-source political compass benchmark with 98 structured questions across 14 policy areas to evaluate frontier LLMs (GPT-5.3, Claude Opus 4.6, KIMI K2). Key finding: refusal patterns and opt-out options significantly shift model positioning, with GPT-5.3 refusing 100% of questions when given an opt-out, while KIMI K2 exhibits topic-specific censorship on Taiwan/Xinjiang despite progressive positions elsewhere.
OpenAI introduces a Child Safety Blueprint, a policy framework developed with NCMEC, state attorneys general, and Thorn to combat AI-enabled child sexual exploitation through modernized laws, improved provider reporting, and built-in safety measures. The initiative brings together legal, operational, and technical approaches to prevent and detect child safety harms at scale.
OpenAI releases prompt-based safety policies and the open-weight gpt-oss-safeguard model to help developers build age-appropriate AI experiences for teens, covering risks like graphic content, harmful behaviors, and dangerous activities.
OpenAI announces comprehensive safety measures for Sora 2 and the Sora app, including provenance signals, C2PA metadata embedding, consent-based likeness controls through characters, and enhanced protections for teen users. The approach combines technical safeguards like content filtering with policy-based guardrails to prevent misuse of AI-generated video.
OpenAI Japan announced the Japan Teen Safety Blueprint, a framework introducing age-aware protections, stronger safety policies for users under 18, expanded parental controls, and well-being-centered design features to ensure teens use generative AI safely.
OpenAI outlines the design philosophy behind Sora's feed system, prioritizing creativity, user control, connection, and safety through steerable ranking algorithms and robust content moderation guidelines.
OpenAI is rolling out an age prediction model on ChatGPT to identify accounts likely belonging to users under 18 and apply appropriate safeguards. The system uses behavioral and account-level signals to estimate age and restricts access to sensitive content for minors, with options for age verification and parental controls.
OpenAI has updated its Model Spec with new Under-18 Principles to guide ChatGPT's behavior for teen users aged 13-17, focusing on safety, age-appropriate interactions, and stronger guardrails around high-risk topics like self-harm and explicit content. The update was developed with input from the American Psychological Association and is grounded in developmental science.
OpenAI releases gpt-oss-safeguard, open-weight reasoning models for safety classification tasks available in 120B and 20B sizes under Apache 2.0 license. The models use chain-of-thought reasoning to classify content according to developer-provided policies at inference time, enabling flexible and explainable content moderation.
OpenAI releases gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, open-weight reasoning models designed for policy-based content classification with full chain-of-thought reasoning. The technical report provides baseline safety evaluations and demonstrates the models' capabilities for content labeling tasks under the Apache 2.0 license.
OpenAI has updated ChatGPT's default model to better handle sensitive mental health conversations, including improved recognition of distress, de-escalation, and routing to crisis resources. The update expands safety testing to include emotional reliance and non-suicidal mental health emergencies as standard baseline metrics.
OpenAI announces safety-focused launch of Sora 2 and the Sora app with built-in protections including watermarking, C2PA metadata, consent-based likeness controls, teen safeguards, and multi-layered content filtering for harmful material and audio.
OpenAI announces comprehensive policies and technical measures to prevent the use of its models for child sexual exploitation and abuse, including pre-deployment protections, user monitoring, developer oversight, and partnerships with organizations like NCMEC and Thorn.
OpenAI is building an age prediction system for ChatGPT to tailor experiences for users under 18, with automatic content restrictions and parental control features launching by month-end. The system will default to the safer under-18 experience when age is uncertain, and includes new features like blackout hours and distress notifications for parents.
OpenAI outlines its approach to balancing teen safety, user freedom, and privacy in ChatGPT, including building an age-prediction system, parental controls, and stricter content rules for under-18 users. The company also signals plans for advanced privacy features and advocates for AI conversation privilege with policymakers.
SafetyKit launches AI agents powered by OpenAI's GPT-5, GPT-4.1, and specialized techniques to detect fraud and prohibited activity across text, images, and financial transactions with 95%+ accuracy. The solution enables marketplaces and fintech platforms to automate risk detection, policy enforcement, and content moderation at scale.