Tag
GRASP introduces a geometry-aware, interaction-based method for scalable pretraining data attribution that models subset dynamics, outperforming existing additive approaches by over double the task-level rank correlation while reducing computation costs.
This paper provides the first systematic analysis of error sources in trajectory-based data attribution methods, identifies optimizer mismatch as the dominant error, proposes AdamW-influence to address it, and offers practical guidelines for data selection via a K-step look-ahead framework.