AI safety testing is getting weird: when does benchmarking become abuse?

Reddit r/artificial 07/02/26, 05:38 PM News

ai-safety benchmarking ethical-testing meta chatbots self-harm controversial

Summary

Reports indicate that Meta contractors posed as teenagers to test rival chatbots on sensitive topics like self-harm, sex, drugs, and eating disorders, raising ethical questions about AI safety benchmarking.

Reports say Meta contractors posed as teens to test rival chatbots on self-harm, sex, drugs, and eating disorders.

Original Article

Similar Articles

Meta Contractors Posed as Teens to Prompt Rival Chatbots About Suicide, Sex, and Drugs

Wired

Meta hired contractors through Covalen to pose as teenagers and send high-risk prompts (suicide, sex, drugs) to rival chatbots including ChatGPT, Gemini, and Character.AI, as part of a safety benchmarking project called Cannes. Over 45,000 prompts were used in August 2025 alone, with the targeted companies unaware of the testing.

The other half of AI safety

Hacker News Top

The article critiques the AI safety field's focus on catastrophic risks while neglecting everyday mental health harms from chatbots like ChatGPT, citing OpenAI's own data on millions of users showing signs of psychosis, mania, or suicidal ideation yet receiving only redirects instead of hard gating.

AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

arXiv cs.AI

AICompanionBench introduces the first publicly available benchmark dataset of 2,123 real-world AI companion conversations annotated across nine safety risk categories, used to evaluate 20 LLMs as safety judges. Results show strong models handle explicit harmful content well but struggle with nuanced risks like manipulation and false positives on benign conversations.

Helping developers build safer AI experiences for teens

OpenAI Blog

OpenAI releases prompt-based safety policies and the open-weight gpt-oss-safeguard model to help developers build age-appropriate AI experiences for teens, covering risks like graphic content, harmful behaviors, and dangerous activities.

Your AI Agent is one bad prompt away from ruining your brand (And why traditional QA is useless)

Reddit r/AI_Agents

The article argues that traditional chatbot QA is broken because it only tests happy paths, and proposes using an AI-powered user simulator that attacks the bot with diverse personas and edge cases to find vulnerabilities before deployment.