Tag
MOTHRAG is a graph-free multi-hop RAG framework that matches the accuracy of graph-based systems like GraphRAG and HippoRAG on benchmarks, while avoiding costly graph rebuilds by using a dense index and query-time orchestration.
This paper introduces answer-in-context, a diagnostic metric for budget-constrained multi-hop RAG that measures whether the gold answer survives in the packed reader context, and proposes a submodular evidence packing method that improves over heuristics under specific conditions.
Proposes OPI, an ontology-guided framework for multi-hop knowledge graph question answering that leverages a relation-centric ontology graph for bidirectional retrieval and iterative refinement, achieving state-of-the-art results on multiple benchmarks.
FlowRAG proposes a novel semantic-aware retrieval framework that constructs a quad-level heterogeneous graph and uses frequency-aware weighted flow to extract explicit reasoning paths, achieving state-of-the-art performance on complex reasoning benchmarks.
The author describes using a knowledge graph extractor built with a Qwen model to generate challenging multi-hop QA pairs for evaluating agentic search systems.
This paper introduces EvoBrowseComp, a dynamic benchmark of 400 English and 400 Chinese complex questions that are synthesized via live-web traversal to evaluate search agents without test-set contamination, ensuring robustness against parametric memorization.
A comprehensive survey analyzing over 300 papers on LLM reasoning, presenting a taxonomy of reasoning paradigms including Chain-of-Thought, Multi-Hop, Mathematical, Commonsense, and others, along with common failure modes and research gaps.
Introduces a benchmark to evaluate how knowledge editing methods handle logical consequences of fact edits, revealing that existing approaches like ROME and FT accurately insert direct assertions but fail to propagate entailed knowledge, with a performance gap of up to 24%.
AdaGATE is a training-free evidence controller for multi-hop RAG that uses entity-centric gap tracking, micro-query generation, and utility-based selection to improve robustness under noisy retrieval, achieving state-of-the-art evidence F1 with fewer input tokens.