Tag
This paper argues that standard RLHF's scalarization of human preferences collapses multiple valid interpretations into a single target, mis-measuring alignment in culturally plural societies. Analyzing a Malaysian dataset, they find 79% of prompts have multiple majority-supported responses that single-winner aggregation discards.
Introduces Generative-Evaluative Agreement (GEA), a validity criterion for LLM-enabled adaptive assessments, and measures it on a two-stage adaptive test, finding that the model recovers about half the intended variance with systematic bias.