Tag
Introduces Generative-Evaluative Agreement (GEA), a validity criterion for LLM-enabled adaptive assessments, and measures it on a two-stage adaptive test, finding that the model recovers about half the intended variance with systematic bias.