Tag
DiscoLoop introduces a looping architecture that carries both discrete embedding and continuous hidden-state channels to improve multi-hop reasoning in transformers, achieving near-perfect accuracy on synthetic tasks and stronger performance on real-world language modeling.
The Looped Transformer achieves internal reasoning by designing recursion directly into the architecture, avoiding the inefficiency of chain-of-thought having to simulate iteration by generating discrete tokens. Latest research shows it performs excellently on multi-hop reasoning, and can be further improved through stabilization techniques and adaptive recursion.
This paper explores grounding multi-hop textual-spatial stories into geometry-aware modalities like grids, showing a 42% performance improvement when switching from language-only to grid-based reasoning, and introduces a switching metric for modality selection in LLMs.
This paper frames regulatory document review as an LLM-guided planning problem, using a vectorless document tree with browse, read, and search tools and a dynamic knowledge graph as state. On a 200-question benchmark over NuScale FSAR documents, the system achieves 81.5% accuracy with 0.93 RAGAS Faithfulness, significantly outperforming existing RAG methods.
Introduces SAG (SQL-Retrieval Augmented Generation), a novel retrieval-augmented generation architecture based on SQL dynamic hyperedges. It is more efficient and lower cost for multi-hop reasoning compared to traditional RAG and GraphRAG. It is open-sourced on GitHub and has achieved good evaluation results.
This paper proposes HyGRAG, a hierarchical graph RAG framework that integrates contextual and relational information for multi-hop reasoning, achieving a 9.7% average accuracy improvement over existing methods.
An open-source full-stack pipeline that constructs a Knowledge Graph from raw text, uses hybrid search (dense + sparse + graph traversal) to solve multi-hop reasoning problems in LLMs, and re-ranks results with Reciprocal Rank Fusion and a Cross-Encoder.
The article argues that knowledge graphs and vector databases serve different purposes in enterprise AI and should be used together rather than as alternatives. It recommends hybrid architectures or managed solutions like 60x to handle both semantic recall and structural reasoning.
This paper investigates how TMK-based question generation strategies affect dataset quality for procedural and multi-hop reasoning in AI learning systems, comparing strict TMK generation, transcript-first generation, and TMK-aware generation, and introduces a grounding validation framework.
The paper proposes SVoT, a reinforcement learning framework that generates interleaved, verifiable intermediate states and visualizations for multi-hop spatial reasoning in MLLMs, achieving significant accuracy gains on new benchmarks involving multi-object interactions and numerical reasoning.
This paper identifies a 'concept bottleneck' in the CoCoNuT latent reasoning paradigm where hidden states are overwritten across passes, and proposes AGCLR, which adds a gated persistent memory stream to retain intermediate facts. Evaluations on GSM8K, HotpotQA, and ProsQA using GPT-2 show consistent improvements, especially on multi-hop tasks.
This paper investigates whether direct activation transfer between language models can improve reasoning, using a linear translation layer from Pythia-160M to Pythia-410M. Despite achieving high representational alignment, the transferred activations do not improve multi-hop question answering, yielding a negative result.
OCC-RAG introduces a family of compact small language models optimized for faithful question answering, using a novel pipeline to synthesize multi-context multi-hop QA data. The models demonstrate competitive performance against larger models on reasoning and faithfulness benchmarks.
This paper introduces 'composition collapse', a phenomenon where language models with stable factual knowledge still fail to compose that knowledge into correct multi-hop reasoning, and proposes a double-gate protocol to isolate composition failure from atomic knowledge instability.
Proposes Decompose-and-Refine (DaR), a framework for statute-grounded legal question answering that decomposes complex questions into atomic sub-questions and generates parametric queries for precise statutory retrieval, showing improvements on the KoBLEX benchmark.
A new Stanford paper shows that under equal reasoning token budgets, single LLMs typically outperform multi-agent systems on multi-hop reasoning tasks, with gains from multi-agent setups often stemming from additional compute rather than architectural superiority. The paper uses the Data Processing Inequality to explain why information loss in handoffs harms multi-agent performance, and identifies context quality as the key factor where multi-agent systems can provide benefits.
This paper introduces PyRAG, a framework that reformulates multi-hop retrieval-augmented generation as program synthesis and execution, using executable Python code to represent reasoning steps and enable deterministic feedback and adaptive retrieval.
A new open-source memory layer called Memvid claims to outperform all existing RAG systems, achieving +35% SOTA on LoCoMo and +76% on multi-hop reasoning, packaged as a single .mv2 file.
This paper introduces the Context Gathering Decision Process (CGDP), a POMDP framework to model LLM agent search behavior, proposing interventions that improve multi-hop reasoning and reduce token usage without performance degradation.
This paper introduces TGS-RAG, a bidirectional verification and completion framework that synergizes text-based and graph-based Retrieval-Augmented Generation to improve multi-hop reasoning accuracy.