Tag
This paper studies multilingual unlearning in LLMs by extending the TOFU benchmark to five languages. It finds that unlearning transfer varies by script and family, operates primarily in later decoding layers, and that a single steering direction can recover much of the suppressed knowledge across languages.
The article argues that AI deployments often fail because teams treat the ability to reverse AI decisions as a cost rather than a design feature, and provides examples and principles for designing reversible AI systems.