Tag
Explains mid-training as a stage between pre-training and post-training, where a base model is continued on curated data to strengthen specific capabilities before instruction tuning.
MIRA is a data selection framework for the mid-training stage of LLM development that adaptively constructs quality rubrics per data source, using a teacher model to propose dimensions and distilling into lightweight scorers. It achieves superior performance using only half the tokens compared to full-corpus training.
This paper investigates how using diverse self-generated data during mid-training improves the effectiveness of Reinforcement Learning in Large Language Models, particularly for reasoning tasks.
This paper proposes mid-training language models on self-generated diverse reasoning traces before reinforcement learning, showing improved RL performance on math benchmarks by exposing models to multiple valid solution approaches.