Tag
RegMix-D extends RegMix to dynamic data mixing by using loss trajectories from proxy runs to predict optimal mixtures at multiple training stages, achieving improvements over static methods.
FastMix is a novel framework that automates data mixture discovery for training large models using a single proxy model and bilevel optimization, achieving state-of-the-art performance with significant efficiency gains.
The paper identifies repetition mismatch as a primary cause for data mixture experiments failing to scale, and proposes a repetition-controlled subsampling procedure that allows small-scale experiments to recover near-optimal mixtures using far fewer tokens.
Explains mid-training as a stage between pre-training and post-training, where a base model is continued on curated data to strengthen specific capabilities before instruction tuning.