ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives
Summary
ECI_sem is a training-free method for ranking hard negative sources in dense retrieval using frozen embeddings, achieving strong performance on MS MARCO and BEIR benchmarks.
View Cached Full Text
Cached at: 06/08/26, 11:18 PM
Paper page - ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives
Source: https://huggingface.co/papers/2603.20990 Published on Jun 5
·
Submitted byhttps://huggingface.co/chungimungi
Aarushon Jun 8
Abstract
ECI_sem, a semantic residual variant of Effective Contrastive Information, ranks negative sources for dense retrieval using frozen embeddings without requiring training, achieving strong performance on MS MARCO and BEIR benchmarks.
Hard-negative source selectionfordense retrievalis usually decided only after fine-tuning anddownstream evaluation. We propose ECI_{sem}, asemantic residualvariant ofEffective Contrastive Information(ECI) that ranks candidate negative sources using frozen target-encoder embeddings. ECI_{sem} is training-free, not label-free: each scored example requires a query, a labeled positive, and an explicit candidate negative. ECI_{sem} builds aweighted residual information matrixfromtarget consistency,semantic locality,lexical residuality, and alog-determinant diversityobjective. OnMS MARCOnegative sources, in-family ECI_{sem} ranks LLM negatives highest among non-hybrid sources and Dense+LLM highest among hybrid sources, matching the strongest aggregateBEIRtransfer results acrossDistilBERT,E5-base, andContriever.Controlled ablationsshow that this alignment depends on using thetarget encoderfamily, while additional ablations show stability under sample-size, temperature, tokenizer, and IDF-corpus perturbations. The theory gives a local linearized link to loss reduction, while the empirical study treatsdownstream evaluationas the final test.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2603\.20990
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2603.20990 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2603.20990 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2603.20990 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities
The paper introduces Hard Negative Captions (HNC), a dataset and method for training vision-language models to achieve fine-grained comprehension by addressing weak associations in web-collected image-text pairs.
Sem-Detect: Semantic Level Detection of AI Generated Peer-Reviews
Sem-Detect introduces a method to distinguish AI-generated peer reviews from human-written ones by combining textual features with claim-level semantic analysis. It achieves a 25.5% improvement in true positive rate at 0.1% false positive rate over baselines, and shows that LLM-refined human reviews retain distinct semantic signals, with fewer than 3.5% misclassified as AI-generated.
Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding
Proposes Slipform, a training framework that uses lexical concreteness to select harder negatives and a margin-based Cement loss, boosting compositional reasoning in vision-language models.
When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE
The paper identifies a misalignment between the softmax-based InfoNCE loss and the normalized embedding setting in modern contrastive learning. It proposes WEINCE, a simple modification that blends softmax logits with an endpoint shortfall correction using extreme value theory, yielding consistent improvements across vision benchmarks.
Evidence Absence Is Not Evidence Insufficiency: Diagnosing NEI Construction Artifacts in Fact Verification
The paper introduces NEI-CAP, a diagnostic protocol to evaluate how 'Not Enough Information' examples are constructed in fact verification benchmarks, revealing that models trained on shortcut-prone NEI constructions fail to transfer to harder, semantically related insufficient evidence cases.