toxicity

#toxicity

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

arXiv cs.CL ↗ · 2d ago Cached

This replication study evaluates DExperts for mitigating toxicity in LLMs, finding near-perfect safety against explicit toxicity but reduced effectiveness against implicit hate speech and a significant latency trade-off.

0 favorites 0 likes

#toxicity

Toxicity on Social Media – The Noisy Room

Hacker News Top ↗ · 5d ago Cached

A Stanford study analyzing billions of social media posts reveals that only ~3% of users generate severely toxic content, but engagement-driven algorithms disproportionately amplify this minority, distorting public perception and driving self-censorship among the majority.

0 favorites 0 likes

toxicity

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Toxicity on Social Media – The Noisy Room

Submit Feedback