Tag
This paper studies multilingual unlearning in LLMs by extending the TOFU benchmark to five languages. It finds that unlearning transfer varies by script and family, operates primarily in later decoding layers, and that a single steering direction can recover much of the suppressed knowledge across languages.
This paper introduces GEM, a concept erasure framework for Rectified Flow models that combines trajectory-based unlearning with teacher-guided flow matching, achieving 5× faster and safer content suppression while preserving benign generation.
The paper argues that unlearning in LLMs should be goal-dependent, proposing a cosine-based meta-learned variant of RMU for dangerous knowledge and a multi-layer objective with probe directions for toxicity, achieving strong results across four 7-8B models.