strategy-diversity

Tag

Cards List
#strategy-diversity

Mapping the Evaluation Frontier: An Empirical Survey of the Bias-Reliability Tradeoff Across Eleven Evaluator-Agent Conditions

arXiv cs.LG · yesterday Cached

This empirical survey extends prior work on the bias-reliability tradeoff in LLM evaluation by measuring evaluator coupling, strategy diversity, and small-sample reliability across 11 conditions, confirming that low evaluator influence leads to high measurement noise while strong coupling reduces diversity and noise.

0 favorites 0 likes
← Back to home

Submit Feedback