data-mixture

#data-mixture

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

arXiv cs.LG ↗ · 4d ago Cached

The paper identifies repetition mismatch as a primary cause for data mixture experiments failing to scale, and proposes a repetition-controlled subsampling procedure that allows small-scale experiments to recover near-optimal mixtures using far fewer tokens.

0 favorites 0 likes

#data-mixture

@NielsRogge: What is mid-training? The stage between pre-training and post-training A base model is continued on a smaller, curated …

X AI KOLs Timeline ↗ · 2026-06-02 Cached

Explains mid-training as a stage between pre-training and post-training, where a base model is continued on curated data to strengthen specific capabilities before instruction tuning.

0 favorites 0 likes

data-mixture

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

@NielsRogge: What is mid-training? The stage between pre-training and post-training A base model is continued on a smaller, curated …

Submit Feedback