continual-pretraining

#continual-pretraining

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

arXiv cs.CL ↗ · 2026-06-02 Cached

This paper studies catastrophic forgetting in multilingual expert language models during continual pretraining and proposes five parameter alignment strategies (hard layer freezing, soft regularization, post-hoc weight reversion, and model merging) to mitigate forgetting across 32 training languages with minimal cost to language acquisition.

0 favorites 0 likes

#continual-pretraining

Toward LLMs Beyond English-Centric Development

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper demonstrates that LLMs are heavily biased toward English, and shows that continual pre-training does not offer cost advantages over training from scratch for adapting models to other languages, especially for cultural understanding.

0 favorites 0 likes

#continual-pretraining

Injecting Structured Biomedical Knowledge into Language Models: Continual Pretraining vs. GraphRAG

arXiv cs.CL ↗ · 2026-04-21 Cached

This paper compares two strategies for injecting structured biomedical knowledge from the UMLS Metathesaurus into language models: continual pretraining (embedding knowledge into model parameters) and GraphRAG (querying a knowledge graph at inference time). Results show improvements on biomedical QA benchmarks, with GraphRAG on LLaMA 3-8B yielding over 3 and 5 accuracy points on PubMedQA and BioASQ respectively without any retraining.

0 favorites 0 likes

#continual-pretraining

Scaling Agents via Continual Pre-training

Papers with Code Trending ↗ · 2025-09-16 Cached

Proposes Agentic Continual Pre-training to build agentic foundation models, achieving state-of-the-art results on 10 benchmarks with AgentFounder-30B, including 39.9% on BrowseComp-en and 43.3% on BrowseComp-zh.

0 favorites 0 likes

continual-pretraining

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

Toward LLMs Beyond English-Centric Development

Injecting Structured Biomedical Knowledge into Language Models: Continual Pretraining vs. GraphRAG

Scaling Agents via Continual Pre-training

Submit Feedback