Is Position Bias in Dense Retrievers Built In-or Learned from Data?
Summary
This paper investigates whether positional bias in dense retrievers originates from architecture or training data, finding that training data distribution strongly influences bias and that balanced training can reduce sensitivity by up to 87% while maintaining retrieval performance.
View Cached Full Text
Cached at: 05/29/26, 07:00 AM
Paper page - Is Position Bias in Dense Retrievers Built In-or Learned from Data?
Source: https://huggingface.co/papers/2605.26578
Abstract
Training data position distribution significantly influences positional bias in dense retrievers, with balanced training reducing sensitivity by up to 87% while maintaining competitive retrieval performance.
Dense retrieversexhibitpositional bias, favoring documents whosequery-relevant informationappears near the beginning and degradingretrieval performancewhen the information appears later. While prior work onpositional biasindense retrievershas largely focused on architectural explanations, we study how the positional distribution of evidence in training data affects retrieval-level bias direction. To test this, we construct synthetic position-targeted training sets in which query-relevant evidence appears at the beginning, middle, or end of documents, and fine-tune eight architecturally diversepretrained modelsunder position-skewed and balanced training distributions. At the ranking level, we observe a strong directional pattern across the examined models: skewed training distributions favor evidence at the corresponding positions.Position-balanced trainingreduces positional sensitivity by 57--87\% on position-aware benchmarks, with competitive meanretrieval performancein our controlled setting. Representation-level analyses further suggest thatfine-tuningoften reshapes learned positional preferences, although pre-existing architectural or pretraining-specific tendencies persist in some models. These results identify training-position distribution as a major controllable factor in retrieval-level position bias and suggest balanced data curation as a practical mitigation strategy.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.26578
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.26578 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.26578 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.26578 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models
This research paper investigates position bias in reasoning models, finding that bias scales with the length of the reasoning trajectory rather than being eliminated by 'more thinking.' The study provides causal evidence and a diagnostic toolkit for auditing this length-driven bias in multiple-choice QA evaluations.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability
Systematic study shows LLM-based dense retrievers outperform BERT baselines on typos and poisoning but remain vulnerable to semantic perturbations, with embedding geometry predicting robustness.
@_reachsumit: OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries @dianetc_ et al pres…
OBLIQ-Bench is a new benchmark that exposes weaknesses in current retrieval systems when handling oblique queries requiring latent or implicit reasoning, showing that even sophisticated retrieval pipelines fail to surface relevant documents that reasoning LLMs can easily verify.
Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels
This paper studies how post-training quantization introduces new biases in instruction-tuned LLMs, finding that 3-bit precision causes 6–21% of previously unbiased items to develop stereotypes, while standard metrics like perplexity fail to detect this degradation.
Effect of Demographic Bias on Skin Lesion Classification
This paper investigates the impact of demographic bias (sex and age) on skin lesion classification using ResNet models, finding that sex biases stem from data imbalances while age biases consistently favor younger groups, and evaluating multi-task and adversarial learning mitigation strategies.