Tag
This paper proposes a training-free method to automatically generate fine-grained evaluation rubrics for LLM-as-a-judge without human annotation, and further introduces an iterative fine-tuning strategy for a rubric generator that outperforms larger proprietary models.
SCOPE is a self-play framework for open-ended tasks that co-evolves a Challenger and Solver policy, achieving up to +10.4 points on benchmarks without external supervision.