Active Learners as Efficient PRP Rerankers

Hugging Face Daily Papers 05/15/26, 12:00 AM Papers

pairwise-ranking active-learning reranking llm noisy-comparisons position-bias

Summary

This paper reframes pairwise ranking prompting as active learning from noisy comparisons, introducing a noise-robust framework with a randomized-direction oracle to improve ranking quality under call constraints and address position bias.

Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.

Original Article

View Cached Full Text

Cached at: 05/20/26, 10:37 AM

Paper page - Active Learners as Efficient PRP Rerankers

Source: https://huggingface.co/papers/2605.14236

Abstract

Pairwise ranking prompting is reformulated as active learning from noisy comparisons, with improved rankers that enhance ranking quality under call constraints and address position bias through a randomized oracle.

Pairwise Ranking Prompting(PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet acall budgetdoes not produce a dependable top-K. We thus reframe PRP reranking asactive learningfromnoisy pairwise comparisonsand show that active rankers are drop-in replacements that improveNDCG@10per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematicposition biasinto zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.

View arXiv page View PDF GitHub Add to collection

Get this paper in your agent:

hf papers read 2605\.14236

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.14236 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.14236 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.14236 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Active Learners as Efficient PRP Rerankers

Paper page - Active Learners as Efficient PRP Rerankers

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Active Learners as Efficient PRP Rerankers

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs

When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking

Representation Curriculum: Stagewise Training for Robust Ranking and Allocation

Submit Feedback

Similar Articles

Active Learners as Efficient PRP Rerankers

Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking

Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs

When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking

Representation Curriculum: Stagewise Training for Robust Ranking and Allocation