Tag
Introduces DriftGuard, a safety-aware adaptive moderation framework that uses multiple monitors to detect subtle, safety-relevant distribution shifts and selectively updates models with a hard-mix adaptation set, improving toxic recall on evolving datasets.