Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Hugging Face Daily Papers 05/10/26, 12:00 AM Papers

Summary

This research investigates how task geometry influences continual post-training in LLMs, identifying 'geometry conflict' as a cause of forgetting and a mechanism for controlling update integration. The authors propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free method that improves retention and performance across various model sizes.

Continual post-training aims to extend large language models (LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they cause catastrophic forgetting. Existing methods mitigate forgetting through sequential fine-tuning, replay, regularization, or model merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLM continual post-training through three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions through task geometry: we represent each post-training task by its parameter update and study the covariance geometry induced by the update. Our central finding is that: forgetting can be considered as a state-relative update-integration failure, it arises when the covariance geometries induced by tasks misalign with the geometry of the evolving model state. Sequential updates transfer when they remain compatible with the model state shaped by previous updates, and interfere when state-relative geometry conflict becomes high. Motivated by this finding, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free update-integration method that constructs a shared Wasserstein metric via Gaussian Wasserstein barycenters and uses geometry conflict to gate geometry-aware correction. Across Qwen3 0.6B--14B on domain-continual and capability-continual settings, GCWM consistently outperforms data-free baselines, improving retention and final performance without replay data. These results identify geometry conflict as both an explanatory signal for forgetting and a practical control signal for LLM continual post-training.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/12/26, 02:49 AM

Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Source: https://huggingface.co/papers/2605.09608 Authors:

Abstract

Research investigates how task geometry influences continual post-training of large language models, identifying geometry conflict as both a cause of forgetting and a control mechanism for update integration.

Continual post-trainingaims to extendlarge language models(LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they causecatastrophic forgetting. Existing methods mitigate forgetting throughsequential fine-tuning, replay, regularization, ormodel merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLMcontinual post-trainingthrough three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions throughtask geometry: we represent each post-training task by itsparameter updateand study thecovariance geometryinduced by the update. Our central finding is that: forgetting can be considered as a state-relativeupdate-integrationfailure, it arises when the covariance geometries induced by tasks misalign with the geometry of the evolving model state. Sequential updates transfer when they remain compatible with the model state shaped by previous updates, and interfere when state-relativegeometry conflictbecomes high. Motivated by this finding, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-freeupdate-integrationmethod that constructs a sharedWasserstein metricviaGaussian Wasserstein barycentersand usesgeometry conflictto gate geometry-aware correction. Across Qwen3 0.6B--14B on domain-continual and capability-continual settings, GCWM consistently outperforms data-free baselines, improving retention and final performance without replay data. These results identifygeometry conflictas both an explanatory signal for forgetting and a practical control signal for LLMcontinual post-training.

View arXiv page View PDF GitHub Add to collection

Get this paper in your agent:

hf papers read 2605\.09608

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.09608 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.09608 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.09608 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Useful memories become faulty when continuously updated by LLMs (30 minute read)

Measuring Representation Robustness in Large Language Models for Geometry

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

Useful Memories Become Faulty When Continuously Updated by LLMs

Submit Feedback

Similar Articles

Useful memories become faulty when continuously updated by LLMs (30 minute read)

Measuring Representation Robustness in Large Language Models for Geometry

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

Useful Memories Become Faulty When Continuously Updated by LLMs