Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Summary
INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.
View Cached Full Text
Cached at: 05/14/26, 04:19 PM
Paper page - Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Source: https://huggingface.co/papers/2605.05806
Abstract
INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.
Retrieval-augmented generation(RAG) typically treats retrieval and generation as separate systems. We ask whether anattention-based encoder-decodercan instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrievalvia Attention), a framework wheredecoder attention queriesscorepre-encoded evidence chunksthat are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating theretriever-generator mismatchtypical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on bothevidence recallandend-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.05806
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.05806 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.05806 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.05806 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems
The paper introduces BRIGHT-Pro, a new benchmark for reasoning-intensive retrieval, and RTriever-Synth, a synthetic corpus used to fine-tune RTriever-4B for improved performance in agentic search systems.
The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context
Proposes Computational Reality Monitoring to detect when language models rely on pretraining memory rather than retrieved context, addressing the attribution blind spot in retrieval-augmented generation.
Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback
Critic-R introduces a framework using a critic model to provide introspective feedback between the reasoning agent and retriever, improving agentic search performance at both inference and training time without requiring retraining the agent.
Xetrieval: Mechanistically Explaining Dense Retrieval
Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features, providing feature-level explanations for retrieval decisions without expensive autoregressive generation.
Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents
This paper proposes MERIT, a dynamic multi-horizon memory retrieval framework for interactive text-to-SQL agents that uses episode-level and turn-level memory with learned retrieval policies optimized via reinforcement learning and a process reward model for dense rewards. Experiments on BIRD-Interact and Spider2-Snow show that MERIT outperforms static and single-horizon dynamic baselines in success rate while requiring fewer interaction turns.