Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Hugging Face Daily Papers Papers

Summary

INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.
Original Article
View Cached Full Text

Cached at: 05/14/26, 04:19 PM

Paper page - Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Source: https://huggingface.co/papers/2605.05806

Abstract

INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.

Retrieval-augmented generation(RAG) typically treats retrieval and generation as separate systems. We ask whether anattention-based encoder-decodercan instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrievalvia Attention), a framework wheredecoder attention queriesscorepre-encoded evidence chunksthat are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating theretriever-generator mismatchtypical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on bothevidence recallandend-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.05806

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.05806 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.05806 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.05806 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Xetrieval: Mechanistically Explaining Dense Retrieval

Hugging Face Daily Papers

Xetrieval is a mechanistic framework that explains dense retrieval by enhancing sentence embeddings with reasoning information and decomposing them into interpretable sparse features, providing feature-level explanations for retrieval decisions without expensive autoregressive generation.

Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents

arXiv cs.CL

This paper proposes MERIT, a dynamic multi-horizon memory retrieval framework for interactive text-to-SQL agents that uses episode-level and turn-level memory with learned retrieval policies optimized via reinforcement learning and a process reward model for dense rewards. Experiments on BIRD-Interact and Spider2-Snow show that MERIT outperforms static and single-horizon dynamic baselines in success rate while requiring fewer interaction turns.