Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Hugging Face Daily Papers 05/05/26, 12:00 AM Papers

retrieval agentic-search benchmark fine-tuning reasoning evaluation ai-research

Summary

The paper introduces BRIGHT-Pro, a new benchmark for reasoning-intensive retrieval, and RTriever-Synth, a synthetic corpus used to fine-tune RTriever-4B for improved performance in agentic search systems.

Reasoning-intensive retrieval aims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capability is increasingly important for agentic search systems, where retrievers must provide complementary evidence across iterative search and synthesis. However, existing work remains limited on both evaluation and training: benchmarks such as BRIGHT provide narrow gold sets and evaluate retrievers in isolation, while synthetic training corpora often optimize single-passage relevance rather than evidence portfolio construction. We introduce BRIGHT-Pro, an expert-annotated benchmark that expands each query with multi-aspect gold evidence and evaluates retrievers under both static and agentic search protocols. We further construct RTriever-Synth, an aspect-decomposed synthetic corpus that generates complementary positives and positive-conditioned hard negatives, and use it to LoRA fine-tune RTriever-4B from Qwen3-Embedding-4B. Experiments across lexical, general-purpose, and reasoning-intensive retrievers show that aspect-aware and agentic evaluation expose behaviors hidden by standard metrics, while RTriever-4B substantially improves over its base model.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/08/26, 08:11 AM

Paper page - Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Source: https://huggingface.co/papers/2605.04018

Abstract

Researchers introduce BRIGHT-Pro, an expanded expert-annotated benchmark for reasoning-intensive retrieval, and RTriever-Synth, an aspect-decomposed synthetic corpus, to improve retriever performance through agentic search evaluation and LoRA fine-tuning.

Reasoning-intensive retrievalaims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capability is increasingly important foragentic search systems, where retrievers must provide complementary evidence across iterative search and synthesis. However, existing work remains limited on both evaluation and training: benchmarks such asBRIGHTprovide narrow gold sets and evaluate retrievers in isolation, while synthetic training corpora often optimize single-passage relevance rather thanevidence portfolio construction. We introduceBRIGHT-Pro, an expert-annotated benchmark that expands each query with multi-aspect gold evidence and evaluates retrievers under both static and agentic search protocols. We further constructRTriever-Synth, anaspect-decomposed synthetic corpusthat generates complementary positives and positive-conditioned hard negatives, and use it to LoRA fine-tuneRTriever-4BfromQwen3-Embedding-4B. Experiments across lexical, general-purpose, and reasoning-intensive retrievers show that aspect-aware and agentic evaluation expose behaviors hidden by standard metrics, whileRTriever-4Bsubstantially improves over its base model.

View arXiv page View PDF GitHub11 Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.04018 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.04018 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.04018 in a Space README.md to link it from this page.

Collections including this paper5

Browse 5 collections that include this paper

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Paper page - Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper5

Similar Articles

LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

@dbreunig: Reasoning models are great at understanding nuance and natural language. This nuance hasn't trickled down to retrieval …

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Submit Feedback

Similar Articles

LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

@dbreunig: Reasoning models are great at understanding nuance and natural language. This nuance hasn't trickled down to retrieval …

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing