Tag
This paper studies catastrophic forgetting in multilingual expert language models during continual pretraining and proposes five parameter alignment strategies (hard layer freezing, soft regularization, post-hoc weight reversion, and model merging) to mitigate forgetting across 32 training languages with minimal cost to language acquisition.