benchmark-generator

Tag

Cards List
#benchmark-generator

RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator

arXiv cs.CL · 2026-05-22 Cached

RankJudge is a benchmark generator that creates paired multi-turn conversations with injected flaws to evaluate LLM judges on their ability to correctly identify better and worse responses in complex dialogues.

0 favorites 0 likes
← Back to home

Submit Feedback