AI safety testing is getting weird: when does benchmarking become abuse?

Reddit r/artificial News

Summary

Reports indicate that Meta contractors posed as teenagers to test rival chatbots on sensitive topics like self-harm, sex, drugs, and eating disorders, raising ethical questions about AI safety benchmarking.

Reports say Meta contractors posed as teens to test rival chatbots on self-harm, sex, drugs, and eating disorders.
Original Article

Similar Articles

The other half of AI safety

Hacker News Top

The article critiques the AI safety field's focus on catastrophic risks while neglecting everyday mental health harms from chatbots like ChatGPT, citing OpenAI's own data on millions of users showing signs of psychosis, mania, or suicidal ideation yet receiving only redirects instead of hard gating.

AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

arXiv cs.AI

AICompanionBench introduces the first publicly available benchmark dataset of 2,123 real-world AI companion conversations annotated across nine safety risk categories, used to evaluate 20 LLMs as safety judges. Results show strong models handle explicit harmful content well but struggle with nuanced risks like manipulation and false positives on benign conversations.

Helping developers build safer AI experiences for teens

OpenAI Blog

OpenAI releases prompt-based safety policies and the open-weight gpt-oss-safeguard model to help developers build age-appropriate AI experiences for teens, covering risks like graphic content, harmful behaviors, and dangerous activities.