Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Hugging Face Daily Papers 05/08/26, 12:00 AM Papers

Summary

INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.

Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

Original Article

View Cached Full Text

Cached at: 05/14/26, 04:19 PM

Paper page - Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Source: https://huggingface.co/papers/2605.05806

Abstract

INTRA demonstrates that attention-based models can perform retrieval directly from internal representations, unifying retrieval and generation while improving evidence recall and answer quality.

Retrieval-augmented generation(RAG) typically treats retrieval and generation as separate systems. We ask whether anattention-based encoder-decodercan instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrievalvia Attention), a framework wheredecoder attention queriesscorepre-encoded evidence chunksthat are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating theretriever-generator mismatchtypical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on bothevidence recallandend-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.05806

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.05806 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.05806 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.05806 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Paper page - Retrieval from Within: An Intrinsic Capability of Attention-Based Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

REAL: REtrieval-reAsoning and Logic-constructed Attention Behaviors for Long-Context KV Cache Compression

Submit Feedback

Similar Articles

Rethinking Reasoning-Intensive Retrieval: Evaluating and Advancing Retrievers in Agentic Search Systems

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

REAL: REtrieval-reAsoning and Logic-constructed Attention Behaviors for Long-Context KV Cache Compression