Tag
This replication study evaluates DExperts for mitigating toxicity in LLMs, finding near-perfect safety against explicit toxicity but reduced effectiveness against implicit hate speech and a significant latency trade-off.