Tag
This position paper advocates for developing 'data probes'—synthetic sequences from random processes—to systematically study how data characteristics affect LLM performance, aiming to move beyond empirical heuristics.