Tag
This paper investigates whether positional bias in dense retrievers originates from architecture or training data, finding that training data distribution strongly influences bias and that balanced training can reduce sensitivity by up to 87% while maintaining retrieval performance.
This paper identifies a blind spot in long-context LLM reasoning benchmarks: they fail to control task position within the context, allowing positional failures to go undetected. The authors propose Context Rot Evaluation (CRE) to systematically vary task position, filler content, and context length, revealing severe accuracy drops for some models when reasoning tasks are placed in the middle of long contexts.
This paper reframes pairwise ranking prompting as active learning from noisy comparisons, introducing a noise-robust framework with a randomized-direction oracle to improve ranking quality under call constraints and address position bias.
This research paper investigates position bias in reasoning models, finding that bias scales with the length of the reasoning trajectory rather than being eliminated by 'more thinking.' The study provides causal evidence and a diagnostic toolkit for auditing this length-driven bias in multiple-choice QA evaluations.