Tag
Researchers propose HSPD, a corpus-level detoxification pipeline that rewrites toxic spans in pretraining data while preserving semantics, achieving state-of-the-art toxicity reduction on GPT-2 XL, LLaMA-2, OPT, and Falcon models.
This paper proposes CAP-TTA, a test-time adaptation framework that uses preconditioned LoRA updates triggered by bias-risk scores to mitigate toxicity and bias in large language models during narrative generation, achieving faster optimization and better fluency than standard baselines.