Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback
Summary
Critic-R introduces a framework using a critic model to provide introspective feedback between the reasoning agent and retriever, improving agentic search performance at both inference and training time without requiring retraining the agent.
View Cached Full Text
Cached at: 06/08/26, 07:14 AM
Paper page - Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback
Source: https://huggingface.co/papers/2606.00590 What’s the real bottleneck in your search agent? Often it’s the retriever, and you don’t need to retrain your agent to fix it.
🗞️ Most existing agentic search approaches (like Search-R1) primarily optimize the reasoning agent while treating the retrieval model as a frozen black-box component. This design implicitly assumes that a sufficiently capable reasoning model can compensate for retrieval failures through improved query reformulation alone. We challenge this assumption by arguing that sub-optimal retrieval can be a bottleneck in agentic search performance. There has been some attempt (Agentic-R, CoSearch) to address this issue by jointly optimizing retrievers and reasoning agents. In practice, however, these methods are difficult to apply in settings where the reasoning model cannot be further trained, the retriever is externally provided, or gold-passage supervision is unavailable.
♦️ To address this, we propose Critic-R, a framework that closes the feedback loop between the reasoning agent and the retriever, at both inference and training time. Instead of blindly accepting whatever the retriever returns, Critic-R uses a separate critic model that reads the agent’s introspective reasoning trace after it consumes the retrieved documents, and decides whether that evidence is actually sufficient to support the next reasoning step.
This verification signal powers two complementary mechanisms: 🔹 Critic-R-Zero (inference-time): when the critic finds the evidence insufficient, it rewrites the retrieval query and instruction based on reasoning agent’s own introspective feedback and tries again, until the agent is satisfied or a refinement budget runs out. No gradient updates anywhere, the agent is untouched, and it works on top of any retriever, including those from Agentic-R or CoSearch. 🔹 Critic-Embed (training-time): to amortize the cost of refinement, we turn Critic-R-Zero’s own trajectories into supervision. Documents that satisfy the agent become positives; documents rejected during failed refinement become hard intra-trajectory negatives. The retriever is fine-tuned with this signal, with no gold-passage annotations required.
Across HotpotQA, 2Wiki, MuSiQue, and Bamboogle: ✅ Critic-R-Zero has +12.4% relative improvement at inference time alone ✅ Critic-Embed gives +7.5% improvement when only the retriever is replaced, beating both off-the-shelf and co-trained retrievers
One interesting finding is that removing the agent’s introspective feedback when collecting training data makes the retriever consistently worse. The agent’s own sense of what’s missing isn’t a minor input to the critic, it’s the primary supervisory signal Critic-Embed inherits.
Check out the paper for more details
Similar Articles
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
The paper introduces BRIGHT-Pro, a new benchmark for reasoning-intensive retrieval, and RTriever-Synth, a synthetic corpus used to fine-tune RTriever-4B for improved performance in agentic search systems.
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning
This paper introduces ICRL, a framework that jointly trains a solver and critic with reinforcement learning to internalize critique guidance, enabling the solver to improve without external critique. It uses distribution calibration and role-wise group advantage estimation, achieving 6-7 point gains over GRPO on agentic and mathematical reasoning tasks.
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval
MemReranker is a reasoning-aware reranking model family (0.6B/4B) designed for agent memory retrieval, addressing limitations in semantic similarity by incorporating LLM knowledge distillation for better temporal and causal reasoning.
QueryAgent-R1: Bridging Query Generation and Product Retrieval for E-Commerce Query Recommendation
QueryAgent-R1 is an agentic framework that bridges query generation and product retrieval in e-commerce using reinforcement learning and memory abstraction, improving query CTR by 2.9% and CVR by 3.1% in online tests.
RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents
RICE-PO is a critic-free policy optimization framework that turns retrieval interactions into localized credit signals for training reasoning agents, outperforming prompt-based and group-based RL baselines on BRIGHT and BEIR benchmarks.