Tag
Introduces Holistic Data Scheduler (HDS), a reinforcement learning-based framework that dynamically adjusts data mixtures during LLM pre-training using a multi-objective reward function, achieving 44% fewer iterations to reach target perplexity and a 7.2% improvement on MMLU.