Tag
This paper explores which agreement statistics for LLM judge validation are redundant when criteria are binary, and provides a checklist for proper reporting including abstention handling.