Tag
Reasoning Arena improves reinforcement learning with verifiable rewards by using trace tournaments and Bradley-Terry models to generate meaningful gradients from non-diverse reward groups, resulting in faster training and better reasoning performance.
This paper introduces ECC, an algorithm that calibrates semantic embeddings with limited model comparisons to cluster queries by latent capability requirements, improving LLM capability ranking quality by over 17 percentage points over baselines.
HumorRank introduces a tournament-based leaderboard using pairwise evaluations and Bradley-Terry MLE to rank LLMs on humor generation, showing humor quality depends on comedic mastery rather than scale.