continued-pre-training

Tag

Cards List
#continued-pre-training

Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training

arXiv cs.CL · 2d ago Cached

This paper discovers predictable scaling laws for optimal hyperparameters (learning rate, batch size) in LLM continued pre-training, proposing a two-stage framework that reduces hyperparameter search overhead by up to 90% while maintaining performance.

0 favorites 0 likes
#continued-pre-training

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

arXiv cs.CL · 2026-05-13 Cached

This paper proposes LayerTracer, an interpretable framework for layer allocation in continued pre-training, demonstrating that freezing deep layers while training shallow ones outperforms full-parameter fine-tuning. It offers a low-cost, actionable strategy for resource-constrained teams optimizing Large Language Models.

0 favorites 0 likes
← Back to home

Submit Feedback