dense-retrieval

#dense-retrieval

DREAM: Dense Retrieval Embeddings via Autoregressive Modeling

Hugging Face Daily Papers ↗ · 2026-06-23 Cached

DREAM trains dense retrieval embeddings by using autoregressive language model attention to supervise query-document similarity, eliminating the need for labeled data. It consistently outperforms baselines on BEIR and RTEB benchmarks across model scales.

0 favorites 0 likes

#dense-retrieval

Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

arXiv cs.CL ↗ · 2026-06-18 Cached

This paper identifies document-side early compression as a failure mode in long-document dense retrieval and introduces the Evidence Dilution Index (EDI) to measure it. The authors propose DICE, a training-free method that splits documents into chunks, encodes them independently, and aggregates them into a single vector, significantly improving retrieval on long documents.

0 favorites 0 likes

#dense-retrieval

MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval

arXiv cs.CL ↗ · 2026-06-18 Cached

MCompassRAG enhances retrieval-augmented generation by enriching chunk representations with topic metadata and using LLM-teacher distillation, achieving 8.24% average improvement in information efficiency with over 5x lower latency compared to strong baselines.

0 favorites 0 likes

#dense-retrieval

ECI_{sem}: Semantic Residual Effective Contrastive Information for Evaluating Hard Negatives

Hugging Face Daily Papers ↗ · 2026-06-05 Cached

ECI_sem is a training-free method for ranking hard negative sources in dense retrieval using frozen embeddings, achieving strong performance on MS MARCO and BEIR benchmarks.

0 favorites 0 likes

#dense-retrieval

@raphaelsrty: At 140 million parameters, our LateOn model yield strong results Unrelated to LateOn, I'm really excited by what's happ…

X AI KOLs Following ↗ · 2026-05-30 Cached

The LateOn model with 140M parameters achieves strong results, and the community is excited about advances in multi-vector models including new CPU indexes and multilingual support.

0 favorites 0 likes

#dense-retrieval

@_reachsumit: Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies @bclavie et al. extract in…

X AI KOLs Following ↗ · 2026-05-29 Cached

The paper proposes Latent Terms, a method using Sparse Autoencoders to extract BM25-ready sparse features from frozen dense retrievers, achieving competitive performance without retrieval-specific training.

0 favorites 0 likes

#dense-retrieval

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

arXiv cs.AI ↗ · 2026-05-29 Cached

CoHyDE introduces an iterative co-training procedure for an LLM rewriter and a dense encoder to improve tool retrieval from large API catalogs. It outperforms single-component baselines, especially on vague queries, by training both components together using InfoNCE and DPO.

0 favorites 0 likes

#dense-retrieval

Xetrieval: Mechanistically Explaining Dense Retrieval

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features, providing feature-level explanations for retrieval decisions without expensive autoregressive generation.

0 favorites 0 likes

#dense-retrieval

Benchmarking Google Embeddings 2 against Open-Source Models for Multilingual Dense Retrieval and RAG Systems

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper benchmarks Google Embeddings 2 against five open-source models for multilingual dense retrieval and RAG, finding GE2 top in accuracy but slower, with mE5-L as a competitive low-latency alternative.

0 favorites 0 likes

#dense-retrieval

@raphaelsrty: We're releasing LateOn and DenseOn today. Two open retrieval models, 149M parameters each. LateOn (ColBERT, multi-vecto…

X AI KOLs Following ↗ · 2026-04-21 Cached

Raphael released two open-source retrieval models, LateOn (ColBERT multi-vector) and DenseOn (single-vector), each 149M parameters and outperforming 4× larger models on BEIR.

0 favorites 0 likes

#dense-retrieval

Spectral Tempering for Embedding Compression in Dense Passage Retrieval

arXiv cs.CL ↗ · 2026-04-20 Cached

Spectral Tempering (SpecTemp) proposes a learning-free method for embedding compression in dense passage retrieval that adaptively determines optimal spectral scaling based on signal-to-noise ratio analysis, outperforming fixed hyperparameter approaches like PCA and whitening.

0 favorites 0 likes

#dense-retrieval

@lateinteraction: The keynote recording is now on YouTube, for everyone who asked us to host it outside X. https://youtube.com/watch?v=Z2…

X AI KOLs Timeline ↗ · 2026-04-13 Cached

A keynote recording argues that late interaction retrieval (e.g., ColBERT-style) is the most promising direction in AI-scale information retrieval research, contending that single-vector dense retrieval is fundamentally flawed and that the IR community must raise its ambitions significantly. The talk introduces the LIMIT benchmark as evidence of dense retrieval's generalization failures and calls for a paradigm shift by 2030.

0 favorites 0 likes

dense-retrieval

Submit Feedback