toxicity-moderation

Tag

Cards List
#toxicity-moderation

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

arXiv cs.CL · 2d ago Cached

Introduces DriftGuard, a safety-aware adaptive moderation framework that uses multiple monitors to detect subtle, safety-relevant distribution shifts and selectively updates models with a hard-mix adaptation set, improving toxic recall on evolving datasets.

0 favorites 0 likes
← Back to home

Submit Feedback