Tag
This research investigates how task geometry influences continual post-training in LLMs, identifying 'geometry conflict' as a cause of forgetting and a mechanism for controlling update integration. The authors propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free method that improves retention and performance across various model sizes.