Tag
This paper introduces Metric Match, a method for selecting a subset of samples for human annotation to estimate LLM judge reliability more efficiently, reducing annotation costs by 32.5% and achieving a win-rate of 0.838 against random selection.
GRASP introduces a geometry-aware, interaction-based method for scalable pretraining data attribution that models subset dynamics, outperforming existing additive approaches by over double the task-level rank correlation while reducing computation costs.