Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
Summary
The paper introduces BRIGHT-Pro, a new benchmark for reasoning-intensive retrieval, and RTriever-Synth, a synthetic corpus used to fine-tune RTriever-4B for improved performance in agentic search systems.
View Cached Full Text
Cached at: 05/08/26, 08:11 AM
Paper page - Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
Source: https://huggingface.co/papers/2605.04018
Abstract
Researchers introduce BRIGHT-Pro, an expanded expert-annotated benchmark for reasoning-intensive retrieval, and RTriever-Synth, an aspect-decomposed synthetic corpus, to improve retriever performance through agentic search evaluation and LoRA fine-tuning.
Reasoning-intensive retrievalaims to surface evidence that supports downstream reasoning rather than merely matching topical similarity. This capability is increasingly important foragentic search systems, where retrievers must provide complementary evidence across iterative search and synthesis. However, existing work remains limited on both evaluation and training: benchmarks such asBRIGHTprovide narrow gold sets and evaluate retrievers in isolation, while synthetic training corpora often optimize single-passage relevance rather thanevidence portfolio construction. We introduceBRIGHT-Pro, an expert-annotated benchmark that expands each query with multi-aspect gold evidence and evaluates retrievers under both static and agentic search protocols. We further constructRTriever-Synth, anaspect-decomposed synthetic corpusthat generates complementary positives and positive-conditioned hard negatives, and use it to LoRA fine-tuneRTriever-4BfromQwen3-Embedding-4B. Experiments across lexical, general-purpose, and reasoning-intensive retrievers show that aspect-aware and agentic evaluation expose behaviors hidden by standard metrics, whileRTriever-4Bsubstantially improves over its base model.
View arXiv pageView PDFGitHub11Add to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.04018 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.04018 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.04018 in a Space README.md to link it from this page.
Collections including this paper5
Similar Articles
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
LatentRAG is a novel framework that shifts reasoning and retrieval for agentic RAG into continuous latent space, reducing inference latency by approximately 90% while maintaining performance comparable to explicit methods.
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
MemReranker is a reasoning-aware reranking model family (0.6B/4B) designed for agent memory retrieval, addressing limitations in semantic similarity by incorporating LLM knowledge distillation for better temporal and causal reasoning.
@dbreunig: Reasoning models are great at understanding nuance and natural language. This nuance hasn't trickled down to retrieval …
A tweet highlights that while reasoning models excel at nuance and natural language understanding, this capability hasn't translated to retrieval systems, pointing to a key bottleneck in AI.
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
The paper introduces Direct Corpus Interaction (DCI), a novel approach allowing AI agents to query raw text directly using standard terminal tools instead of traditional embedding-based retrieval. By bypassing fixed similarity interfaces and offline indexing, DCI significantly outperforms conventional sparse, dense, and reranking baselines across multiple IR and agentic search benchmarks.
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing
The paper introduces CreativityBench, a benchmark for evaluating large language models' ability to creatively repurpose tools based on affordance reasoning. It highlights that current models struggle with creative problem-solving despite strong general reasoning capabilities.