ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Hugging Face Daily Papers 06/05/26, 12:00 AM Papers

hard-negative semantic-residual dense-retrieval effective-contrastive-information training-free ms-marco beir

Summary

ECI_sem is a training-free method for ranking hard negative sources in dense retrieval using frozen embeddings, achieving strong performance on MS MARCO and BEIR benchmarks.

Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream evaluation. We propose ECI_{sem}, a semantic residual variant of Effective Contrastive Information (ECI) that ranks candidate negative sources using frozen target-encoder embeddings. ECI_{sem} is training-free, not label-free: each scored example requires a query, a labeled positive, and an explicit candidate negative. ECI_{sem} builds a weighted residual information matrix from target consistency, semantic locality, lexical residuality, and a log-determinant diversity objective. On MS MARCO negative sources, in-family ECI_{sem} ranks LLM negatives highest among non-hybrid sources and Dense+LLM highest among hybrid sources, matching the strongest aggregate BEIR transfer results across DistilBERT, E5-base, and Contriever. Controlled ablations show that this alignment depends on using the target encoder family, while additional ablations show stability under sample-size, temperature, tokenizer, and IDF-corpus perturbations. The theory gives a local linearized link to loss reduction, while the empirical study treats downstream evaluation as the final test.

Original Article

View Cached Full Text

Cached at: 06/08/26, 11:18 PM

Paper page - ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Source: https://huggingface.co/papers/2603.20990 Published on Jun 5

Submitted byhttps://huggingface.co/chungimungi

Aarushon Jun 8

Abstract

ECI_sem, a semantic residual variant of Effective Contrastive Information, ranks negative sources for dense retrieval using frozen embeddings without requiring training, achieving strong performance on MS MARCO and BEIR benchmarks.

Hard-negative source selectionfordense retrievalis usually decided only after fine-tuning anddownstream evaluation. We propose ECI_{sem}, asemantic residualvariant ofEffective Contrastive Information(ECI) that ranks candidate negative sources using frozen target-encoder embeddings. ECI_{sem} is training-free, not label-free: each scored example requires a query, a labeled positive, and an explicit candidate negative. ECI_{sem} builds aweighted residual information matrixfromtarget consistency,semantic locality,lexical residuality, and alog-determinant diversityobjective. OnMS MARCOnegative sources, in-family ECI_{sem} ranks LLM negatives highest among non-hybrid sources and Dense+LLM highest among hybrid sources, matching the strongest aggregateBEIRtransfer results acrossDistilBERT,E5-base, andContriever.Controlled ablationsshow that this alignment depends on using thetarget encoderfamily, while additional ablations show stability under sample-size, temperature, tokenizer, and IDF-corpus perturbations. The theory gives a local linearized link to loss reduction, while the empirical study treatsdownstream evaluationas the final test.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2603\.20990

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2603.20990 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2603.20990 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2603.20990 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Paper page - ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

Sem-Detect: Semantic Level Detection of AI Generated Peer-Reviews

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE

Evidence Absence Is Not Evidence Insufficiency: Diagnosing NEI Construction Artifacts in Fact Verification

Submit Feedback

Similar Articles

HNC: Leveraging Hard Negative Captions towards Models with Fine-Grained Visual-Linguistic Comprehension Capabilities

Sem-Detect: Semantic Level Detection of AI Generated Peer-Reviews

Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE

Evidence Absence Is Not Evidence Insufficiency: Diagnosing NEI Construction Artifacts in Fact Verification