@zhuofengli96475: DCI just hit #1 on Hugging Face Daily Papers! Try it Now! @HuggingPapers https://huggingface.co/papers/2605.05242…

X AI KOLs Following 05/09/26, 01:57 PM Papers

agentic-search retrieval direct-corpus-interaction grep bash hugging-face daily-papers

Summary

DCI (Direct Corpus Interaction) proposes using simple terminal tools like grep and bash for agentic search, outperforming traditional retrieval methods without embeddings or vector indexes.

🚀 DCI just hit #1 on Hugging Face Daily Papers! Try it Now! @HuggingPapers https://t.co/h1CWuCtuQz https://t.co/8K2O7zZ7vq

Original Article

View Cached Full Text

Cached at: 05/16/26, 09:24 PM

🚀 DCI just hit #1 on Hugging Face Daily Papers! Try it Now! @HuggingPapers

https://t.co/h1CWuCtuQz https://t.co/8K2O7zZ7vq

Paper page - Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Source: https://huggingface.co/papers/2605.05242 Published on May 3

#2 Paper of the day Authors:

Abstract

Direct corpus interaction enables more effective agentic search by allowing agents to query raw text directly, outperforming traditional retrieval methods in complex tasks.

Modernretrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but foragentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, local context checks, and multi-step hypothesis refinement are difficult to implement by calling a conventional off-the-shelf retriever, and evidence filtered out early cannot be recovered by stronger downstream reasoning. Agentic tasks further exacerbate this limitation because they require agents to orchestrate multiple steps, including discovering intermediate entities, combining weak clues, and revising the plan after observing partial evidence. To tackle the limitation, we studydirect corpus interaction(DCI), where an agent searches the raw corpus directly with general-purposeterminal tools(e.g., grep, file reads, shell commands, lightweight scripts), without any embedding model, vector index, or retrieval API. This approach requires no offline indexing and adapts naturally to evolving local corpora. AcrossIR benchmarksand end-to-endagentic searchtasks, this simple setup substantially outperforms strong sparse, dense, andrerankingbaselines on several BRIGHT andBEIR datasets, and attains strong accuracy onBrowseComp-Plusandmulti-hop QAwithout relying on any conventional semantic retriever. Our results indicate that as language agents become stronger, retrieval quality depends not only on reasoning ability but also on the resolution of the interface through which the model interacts with the corpus, with which DCI opens a broader interface-design space foragentic search.

View arXiv page View PDF GitHub207 Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.05242 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.05242 in a dataset README.md to link it from this page.

Spaces citing this paper2

Collections including this paper5

Browse 5 collections that include this paper

Zhuofeng Li (@zhuofengli96475): 🔥 Introducing Direct Corpus Interaction (DCI)! The best retriever for agentic search is no retriever.

🚀 We replaced the entire agentic search pipeline — embedding model, vector index, top-k retrieval — with only grep and bash. 🔧

📄 Paper: https://t.co/h1CWuCtuQz

DCI

@zhuofengli96475: DCI just hit #1 on Hugging Face Daily Papers! Try it Now! @HuggingPapers https://huggingface.co/papers/2605.05242…

Paper page - Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper2

Collections including this paper5

Similar Articles

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

@johnsonshi86: https://x.com/johnsonshi86/status/2072112215097024961

@dair_ai: https://x.com/dair_ai/status/2056018543850754283

Reviving PapersWithCode (by Hugging Face) [P]

Submit Feedback

Similar Articles

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

@johnsonshi86: https://x.com/johnsonshi86/status/2072112215097024961

@dair_ai: https://x.com/dair_ai/status/2056018543850754283

Reviving PapersWithCode (by Hugging Face) [P]