Tag
This paper introduces ADAPT, an online reweighting framework for LLM data curation that dynamically adjusts sample importance during training via loss weighting, outperforming offline selection and mixing methods in cross-benchmark generalization.
This paper introduces Neuron-Activated Graph (NAG) Ranking, a training-free framework for selecting pretraining data aligned with target tasks by identifying and ranking candidate data based on similarity in neuron activation patterns. The approach achieves 4.9% average improvement over random sampling and demonstrates that sparse neuron patterns capture functional capabilities for target learning.