catastrophic-forgetting

#catastrophic-forgetting

Self-Distillation Enables Continual Learning [pdf]

Hacker News Top ↗ · 2026-05-17 Cached

Introduces Self-Distillation Fine-Tuning (SDFT), a method that enables on-policy learning from demonstrations to achieve continual learning without catastrophic forgetting, outperforming supervised fine-tuning.

0 favorites 0 likes

#catastrophic-forgetting

Personal continual learning for LLMs without GPU — position paper [OC]

Reddit r/AI_Agents ↗ · 2026-05-16

The author proposes two architectures, Internal KV-Sphere Architecture (IKSA) and Background Micro Fine-Tuning (BMFT), for enabling LLMs to learn continually from personal interactions without GPU requirements and without catastrophic forgetting.

0 favorites 0 likes

#catastrophic-forgetting

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Hugging Face Daily Papers ↗ · 2026-05-16 Cached

MixSD proposes a self-distillation method for knowledge injection in language models that aligns supervision with the model's native distribution, reducing catastrophic forgetting during fine-tuning. It achieves near-perfect memorization while retaining up to 100% of base capabilities, vastly outperforming standard SFT.

0 favorites 0 likes

#catastrophic-forgetting

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper proposes using reinforcement learning with semantic rewards (via GRPO) to expand LLMs to low-resource languages without the typical alignment tax of catastrophic forgetting, showing improved semantic quality and transferability over supervised fine-tuning.

0 favorites 0 likes

#catastrophic-forgetting

Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

arXiv cs.LG ↗ · 2026-05-14 Cached

The paper proposes Slice, a gradient-surgery-based initialization for LoRA adapters in continual learning that reconciles conflicting gradients from current and past tasks to reduce catastrophic forgetting, achieving better stability-plasticity trade-offs.

0 favorites 0 likes

#catastrophic-forgetting

Early Data Exposure Improves Robustness to Subsequent Fine-Tuning

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper shows that mixing post-training data into pretraining (early exposure) improves how robustly a model retains capabilities after subsequent fine-tuning, challenging the notion that immediate post-training performance predicts retention. Controlled experiments on 135M and 1B models demonstrate that early exposure consistently improves the trade-off between upstream retention and downstream performance.

0 favorites 0 likes

#catastrophic-forgetting

@HowToAI_: Google has quietly dropped what researchers are calling "Attention Is All You Need V2." And it signals the end of the T…

X AI KOLs Timeline ↗ · 2026-05-13

Google researchers introduce Nested Learning, a new architecture that replaces the Transformer by treating models as nested optimization problems, solving catastrophic forgetting and achieving 100% long-context memory stability.

0 favorites 0 likes

#catastrophic-forgetting

Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]

Reddit r/MachineLearning ↗ · 2026-05-13

This paper introduces a Fast-Slow Training framework for LLMs that combines parameter updates with optimized context to improve sample efficiency and reduce catastrophic forgetting during continual learning.

0 favorites 0 likes

#catastrophic-forgetting

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

A fast-slow learning framework for LLMs combines fixed slow weights with optimized fast context weights, achieving up to 3x better sample efficiency and reduced catastrophic forgetting in continual learning scenarios.

0 favorites 0 likes

#catastrophic-forgetting

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

ORBIT proposes a method to mitigate catastrophic forgetting in large language models fine-tuned for generative retrieval by tracking parameter distances and using weight averaging, outperforming common continual learning baselines.

0 favorites 0 likes

#catastrophic-forgetting

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

Hugging Face Daily Papers ↗ · 2026-05-10 Cached

This paper introduces Retention-aware Policy Optimization (RaPO) to mitigate catastrophic forgetting in visual continual learning using reinforcement fine-tuning. RaPO uses trajectory-level reward shaping and cross-task advantage normalization to close the gap between reinforcement and supervised fine-tuning in class- and domain-incremental learning.

0 favorites 0 likes

#catastrophic-forgetting

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Hugging Face Daily Papers ↗ · 2026-05-10 Cached

This research investigates how task geometry influences continual post-training in LLMs, identifying 'geometry conflict' as a cause of forgetting and a mechanism for controlling update integration. The authors propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free method that improves retention and performance across various model sizes.

0 favorites 0 likes

#catastrophic-forgetting

Balancing Stability and Plasticity in Sequentially Trained Early-Exiting Neural Networks

arXiv cs.LG ↗ · 2026-05-08 Cached

The paper addresses catastrophic forgetting in sequentially trained early-exiting neural networks and proposes two methods based on Elastic Weight Consolidation and Learning without Forgetting to preserve earlier exit performance while adding new ones.

0 favorites 0 likes

#catastrophic-forgetting

Attribution-Guided Continual Learning for Large Language Models

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper proposes an attribution-guided continual fine-tuning framework for large language models that estimates task-specific parameter importance in Transformer layers and modulates gradients accordingly, mitigating catastrophic forgetting while maintaining performance on new tasks.

0 favorites 0 likes

#catastrophic-forgetting

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

GeoStack introduces a geometric framework to compose independently trained domain experts in Vision-Language Models without catastrophic forgetting, achieving constant-time inference and a 10x reduction in geometric error.

0 favorites 0 likes

#catastrophic-forgetting

JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

JumpLoRA introduces a novel sparse adapter framework for continual learning in LLMs using JumpReLU gating to dynamically isolate task parameters and prevent catastrophic forgetting. The method enhances LoRA-based approaches and outperforms state-of-the-art continual learning methods like ELLA.

0 favorites 0 likes

#catastrophic-forgetting

Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces Self-Distillation Fine-Tuning (SDFT) as a recovery mechanism for LLMs suffering from performance degradation due to catastrophic forgetting, quantization, and pruning. The authors provide theoretical justification using Centered Kernel Alignment (CKA) to demonstrate that self-distillation aligns the student model's high-dimensional manifold with the teacher's optimal structure, effectively recovering lost capabilities.

0 favorites 0 likes

#catastrophic-forgetting

An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning

Hugging Face Daily Papers ↗ · 2026-04-16 Cached

This paper introduces MMOT, an online mixture model learning framework based on optimal transport theory that addresses incremental learning with distributional shifts through dynamic centroid updates and improved class similarity estimation. The approach includes a Dynamic Preservation strategy to mitigate catastrophic forgetting and maintain class separability in latent space.

0 favorites 0 likes

catastrophic-forgetting

Submit Feedback