data-mixing

#data-mixing

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper introduces OP-Mix, a data mixing algorithm that uses low-rank adapters trained on the current model to cheaply simulate candidate data mixtures, enabling efficient and unified data mixing across pretraining, continual midtraining, and continual instruction tuning. OP-Mix consistently finds near-optimal mixtures while using a fraction of the compute of baselines, improving pretraining perplexity by 6.3% and reducing compute by 66-95% in continual learning settings.

0 favorites 0 likes

#data-mixing

InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Hugging Face Daily Papers ↗ · 2026-05-04 Cached

InfoLaw is a data-aware scaling framework that predicts model loss based on token consumption, model size, data mixture weights, and repetition, enabling efficient data-recipe selection under varying compute budgets.

0 favorites 0 likes

#data-mixing

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

arXiv cs.CL ↗ · 2026-04-21 Cached

This paper presents a comprehensive survey of data mixing methods for LLM pretraining, formalizing the problem as bilevel optimization and introducing a taxonomy that distinguishes static (rule-based and learning-based) from dynamic (adaptive and externally guided) mixing approaches. The authors analyze trade-offs, identify cross-cutting challenges, and outline future research directions including finer-grained domain partitioning and pipeline-aware designs.

0 favorites 0 likes

data-mixing

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

Submit Feedback