strategy-diversity

#strategy-diversity

Mapping the Evaluation Frontier: An Empirical Survey of the Bias-Reliability Tradeoff Across Eleven Evaluator-Agent Conditions

arXiv cs.LG ↗ · yesterday Cached

This empirical survey extends prior work on the bias-reliability tradeoff in LLM evaluation by measuring evaluator coupling, strategy diversity, and small-sample reliability across 11 conditions, confirming that low evaluator influence leads to high measurement noise while strong coupling reduces diversity and noise.

0 favorites 0 likes

strategy-diversity

Mapping the Evaluation Frontier: An Empirical Survey of the Bias-Reliability Tradeoff Across Eleven Evaluator-Agent Conditions

Submit Feedback