Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training
Summary
This research investigates how task geometry influences continual post-training in LLMs, identifying 'geometry conflict' as a cause of forgetting and a mechanism for controlling update integration. The authors propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free method that improves retention and performance across various model sizes.
View Cached Full Text
Cached at: 05/12/26, 02:49 AM
Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training
Source: https://huggingface.co/papers/2605.09608 Authors:
,
,
,
,
,
,
,
,
,
,
Abstract
Research investigates how task geometry influences continual post-training of large language models, identifying geometry conflict as both a cause of forgetting and a control mechanism for update integration.
Continual post-trainingaims to extendlarge language models(LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they causecatastrophic forgetting. Existing methods mitigate forgetting throughsequential fine-tuning, replay, regularization, ormodel merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLMcontinual post-trainingthrough three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions throughtask geometry: we represent each post-training task by itsparameter updateand study thecovariance geometryinduced by the update. Our central finding is that: forgetting can be considered as a state-relativeupdate-integrationfailure, it arises when the covariance geometries induced by tasks misalign with the geometry of the evolving model state. Sequential updates transfer when they remain compatible with the model state shaped by previous updates, and interfere when state-relativegeometry conflictbecomes high. Motivated by this finding, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-freeupdate-integrationmethod that constructs a sharedWasserstein metricviaGaussian Wasserstein barycentersand usesgeometry conflictto gate geometry-aware correction. Across Qwen3 0.6B--14B on domain-continual and capability-continual settings, GCWM consistently outperforms data-free baselines, improving retention and final performance without replay data. These results identifygeometry conflictas both an explanatory signal for forgetting and a practical control signal for LLMcontinual post-training.
View arXiv pageView PDFGitHubAdd to collection
Get this paper in your agent:
hf papers read 2605\.09608
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.09608 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.09608 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.09608 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Useful memories become faulty when continuously updated by LLMs (30 minute read)
This research demonstrates that continuously updating LLM agent memories through distillation and consolidation loops causes performance regression, even when trained on ground-truth solutions. The study finds that episodic-only retention outperforms text-based consolidation, highlighting significant flaws in current self-improvement paradigms.
Measuring Representation Robustness in Large Language Models for Geometry
Researchers introduce GeoRepEval, a framework to evaluate LLM robustness across equivalent geometric problem representations (Euclidean, coordinate, vector). Testing 11 LLMs on 158 geometry problems, they find accuracy gaps up to 14 percentage points based solely on representation choice, with vector formulations being a consistent failure point.
GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs
GeoStack introduces a geometric framework to compose independently trained domain experts in Vision-Language Models without catastrophic forgetting, achieving constant-time inference and a 10x reduction in geometric error.
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction
This paper provides a mechanistic explanation for why LLMs lose track of instructions in long multi-turn interactions, introducing the Goal Accessibility Ratio (GAR) metric and a channel-transition framework. Through ablation studies and residual stream probes, it shows that attention to goal-defining tokens closes over turns while goal information persists in residual representations, with architecture-specific failure modes.
Useful Memories Become Faulty When Continuously Updated by LLMs
A study finds that continuously updating consolidated memories in LLM-based agentic systems degrades performance, and that retaining raw episodic trajectories is more reliable. Experiments on ARC-AGI show that even GPT-5.4 fails more often after consolidation.