replication-study

Tag

Cards List
#replication-study

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

arXiv cs.CL · 2d ago Cached

This replication study evaluates DExperts for mitigating toxicity in LLMs, finding near-perfect safety against explicit toxicity but reduced effectiveness against implicit hate speech and a significant latency trade-off.

0 favorites 0 likes
← Back to home

Submit Feedback