Is Position Bias in Dense Retrievers Built In-or Learned from Data?

Hugging Face Daily Papers Papers

Summary

This paper investigates whether positional bias in dense retrievers originates from architecture or training data, finding that training data distribution strongly influences bias and that balanced training can reduce sensitivity by up to 87% while maintaining retrieval performance.

Dense retrievers exhibit positional bias, favoring documents whose query-relevant information appears near the beginning and degrading retrieval performance when the information appears later. While prior work on positional bias in dense retrievers has largely focused on architectural explanations, we study how the positional distribution of evidence in training data affects retrieval-level bias direction. To test this, we construct synthetic position-targeted training sets in which query-relevant evidence appears at the beginning, middle, or end of documents, and fine-tune eight architecturally diverse pretrained models under position-skewed and balanced training distributions. At the ranking level, we observe a strong directional pattern across the examined models: skewed training distributions favor evidence at the corresponding positions. Position-balanced training reduces positional sensitivity by 57--87\% on position-aware benchmarks, with competitive mean retrieval performance in our controlled setting. Representation-level analyses further suggest that fine-tuning often reshapes learned positional preferences, although pre-existing architectural or pretraining-specific tendencies persist in some models. These results identify training-position distribution as a major controllable factor in retrieval-level position bias and suggest balanced data curation as a practical mitigation strategy.
Original Article
View Cached Full Text

Cached at: 05/29/26, 07:00 AM

Paper page - Is Position Bias in Dense Retrievers Built In-or Learned from Data?

Source: https://huggingface.co/papers/2605.26578

Abstract

Training data position distribution significantly influences positional bias in dense retrievers, with balanced training reducing sensitivity by up to 87% while maintaining competitive retrieval performance.

Dense retrieversexhibitpositional bias, favoring documents whosequery-relevant informationappears near the beginning and degradingretrieval performancewhen the information appears later. While prior work onpositional biasindense retrievershas largely focused on architectural explanations, we study how the positional distribution of evidence in training data affects retrieval-level bias direction. To test this, we construct synthetic position-targeted training sets in which query-relevant evidence appears at the beginning, middle, or end of documents, and fine-tune eight architecturally diversepretrained modelsunder position-skewed and balanced training distributions. At the ranking level, we observe a strong directional pattern across the examined models: skewed training distributions favor evidence at the corresponding positions.Position-balanced trainingreduces positional sensitivity by 57--87\% on position-aware benchmarks, with competitive meanretrieval performancein our controlled setting. Representation-level analyses further suggest thatfine-tuningoften reshapes learned positional preferences, although pre-existing architectural or pretraining-specific tendencies persist in some models. These results identify training-position distribution as a major controllable factor in retrieval-level position bias and suggest balanced data curation as a practical mitigation strategy.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.26578

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.26578 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.26578 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.26578 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

arXiv cs.AI

This research paper investigates position bias in reasoning models, finding that bias scales with the length of the reasoning trajectory rather than being eliminated by 'more thinking.' The study provides causal evidence and a diagnostic toolkit for auditing this length-driven bias in multiple-choice QA evaluations.

Effect of Demographic Bias on Skin Lesion Classification

arXiv cs.AI

This paper investigates the impact of demographic bias (sex and age) on skin lesion classification using ResNet models, finding that sex biases stem from data imbalances while age biases consistently favor younger groups, and evaluating multi-task and adversarial learning mitigation strategies.