semi-supervised

#semi-supervised

Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference

arXiv cs.LG ↗ · 4d ago Cached

This paper introduces PRECISE, an extension of Prediction-Powered Inference that combines a small set of human labels with a large set of LLM judgments to produce unbiased and variance-reduced estimates of ranking evaluation metrics like Precision@K. The method is validated on the ESCI benchmark and in a production A/B test, where it correctly identified the best system variant using only 100 human labels, confirmed by a +407 bps sales improvement.

0 favorites 0 likes

#semi-supervised

TADDLE: A Tool-Augmented Agent for Detecting Deficient LLM-Generated Peer Reviews

arXiv cs.AI ↗ · 2026-05-27 Cached

Introduces TADDLE, a tool-augmented agent for detecting deficient LLM-generated peer reviews, along with an expert-annotated benchmark of 1,800 reviews on 50 ICLR 2025 papers. The system decomposes detection into four specialized analysis tools and uses two-stage semi-supervised learning for binary and multi-label classification.

0 favorites 0 likes

semi-supervised

Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference

TADDLE: A Tool-Augmented Agent for Detecting Deficient LLM-Generated Peer Reviews

Submit Feedback