automated-reviewing

#automated-reviewing

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

This paper investigates the alignment of LLM-generated reviews with human judgment using 1k real ACL 2025 submissions, finding limited agreement, instability across models/prompts, and a method to artificially inflate scores without meaningful changes. The authors advise against relying solely on LLM reviews and call for discussion on their use in handling increasing submission volumes.

0 favorites 0 likes

automated-reviewing

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Submit Feedback