study-design

Tag

Cards List
#study-design

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

arXiv cs.LG · 2026-05-19 Cached

This paper proposes a two-stage sampling design where LLM evaluations are used to augment, rather than replace, human ratings, and provides guidance on determining sample sizes for human and LLM reviews using a doubly robust estimator from missing data literature.

0 favorites 0 likes
← Back to home

Submit Feedback