Tag
GrepSeek trains LLM search agents to directly interact with a text corpus using shell commands like grep, using a two-stage training pipeline with cold-start dataset construction and GRPO refinement, achieving strong F1 and Exact Match on open-domain QA benchmarks.
DCI (Direct Corpus Interaction) proposes using simple terminal tools like grep and bash for agentic search, outperforming traditional retrieval methods without embeddings or vector indexes.
The paper introduces Direct Corpus Interaction (DCI), a novel approach allowing AI agents to query raw text directly using standard terminal tools instead of traditional embedding-based retrieval. By bypassing fixed similarity interfaces and offline indexing, DCI significantly outperforms conventional sparse, dense, and reranking baselines across multiple IR and agentic search benchmarks.