Tag
Proposes Complementary Self-Distillation (SelfCI) to improve contextual integrity in LLMs by balancing utility and privacy. Evaluated on CI-RL and PrivacyLens benchmarks across multiple models.
This paper introduces Attention-Shifting (AS), a novel framework for selective machine unlearning in LLMs that balances effective removal of sensitive information while preventing hallucinations and preserving model utility. The method uses importance-aware attention suppression and retention enhancement to achieve up to 15% higher accuracy preservation compared to existing unlearning approaches on standard benchmarks.