data-mixing

#data-mixing

CausalMix: Data Mixture as Causal Inference for Language Model Training

Hugging Face Daily Papers ↗ · 4d ago Cached

CausalMix formulates data mixture optimization as a causal inference problem for LLM training, enabling dynamic adaptation to shifting data distributions without costly retraining, and demonstrates improved performance on Qwen2.5-0.5B and Qwen3-4B-Base.

0 favorites 0 likes

#data-mixing

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

Hugging Face Daily Papers ↗ · 2026-06-14 Cached

AC-ODM uses reinforcement learning to dynamically optimize pretraining data composition for LLMs, achieving faster convergence and higher downstream accuracy with negligible computational overhead.

0 favorites 0 likes

#data-mixing

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper introduces OP-Mix, a data mixing algorithm that uses low-rank adapters trained on the current model to cheaply simulate candidate data mixtures, enabling efficient and unified data mixing across pretraining, continual midtraining, and continual instruction tuning. OP-Mix consistently finds near-optimal mixtures while using a fraction of the compute of baselines, improving pretraining perplexity by 6.3% and reducing compute by 66-95% in continual learning settings.

0 favorites 0 likes

#data-mixing

InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Hugging Face Daily Papers ↗ · 2026-05-04 Cached

InfoLaw is a data-aware scaling framework that predicts model loss based on token consumption, model size, data mixture weights, and repetition, enabling efficient data-recipe selection under varying compute budgets.

0 favorites 0 likes

#data-mixing

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

arXiv cs.CL ↗ · 2026-04-21 Cached

This paper presents a comprehensive survey of data mixing methods for LLM pretraining, formalizing the problem as bilevel optimization and introducing a taxonomy that distinguishes static (rule-based and learning-based) from dynamic (adaptive and externally guided) mixing approaches. The authors analyze trade-offs, identify cross-cutting challenges, and outline future research directions including finer-grained domain partitioning and pipeline-aware designs.

0 favorites 0 likes

data-mixing

CausalMix: Data Mixture as Causal Inference for Language Model Training

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

Submit Feedback