SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG
Summary
SproutRAG is a hierarchical RAG framework that uses attention-guided tree search and progressive embeddings to retrieve at multiple granularities from long documents, improving information efficiency by 6.1% over baselines.
View Cached Full Text
Cached at: 06/18/26, 05:44 AM
# SproutRAG: Attention-Guided Tree Search with Progressive Embeddings for Long-Document RAG
Source: [https://arxiv.org/html/2606.18381](https://arxiv.org/html/2606.18381)
Amirhossein Abaskohi1, Issam H\. Laradji1,2,\{\}^\{1,2\},Peter West1, Giuseppe Carenini1 1University of British Columbia,2ServiceNow Research
###### Abstract
Retrieval\-augmented generation \(RAG\) systems must balance retrieval granularity with contextual coherence, a challenge that existing methods address through LLM\-guided chunking, single\-level context expansion, or hierarchical summarization\. These approaches variously depend on costly LLM calls during indexing or retrieval, limit context aggregation to a single granularity level, or introduce information loss through summarization\. We presentSproutRAG, an attention\-guided hierarchical RAG framework that addresses this trade\-off by organizing sentence\-level chunks into progressively larger but semantically coherent units, using learned inter\-sentence attention to construct a binary chunking tree\. Unlike prior approaches that rely on external LLMs, fixed context expansion, or lossy summarization,SproutRAGlearns which attention heads and layers best capture semantic document structure, enabling multi\-granularity retrieval without additional LLM calls or compressed summaries\. At retrieval time,SproutRAGuses hierarchical beam search to retrieve candidates at multiple granularities, capturing multi\-sentence relevance beyond flat retrieval\. The framework is trained end\-to\-end with a joint objective that improves both embeddings and tree structure\. Experiments across four benchmarks spanning scientific, legal, and open\-domain settings demonstrate thatSproutRAGimproves information efficiency \(IE\) by 6\.1% on average over the strongest baseline111Code is available on[GitHub](https://github.com/AmirAbaskohi/SproutRAG)\.\.
![[Uncaptioned image]](https://arxiv.org/html/2606.18381v1/figures/logo.png)SproutRAG: Attention\-Guided Tree Search with Progressive Embeddings for Long\-Document RAG
Amirhossein Abaskohi1††thanks:Corresponding author:aabaskoh@cs\.ubc\.ca, Issam H\. Laradji1,2,\{\}^\{1,2\},Peter West1, Giuseppe Carenini11University of British Columbia,2ServiceNow Research
## 1Introduction
Retrieval\-augmented generation \(RAG\) has become the dominant paradigm for grounding large language models \(LLMs\) in external knowledge, helping reduce hallucinations, support domain\-specific reasoning, and improve performance on knowledge\-intensive tasksLewiset al\.\([2020](https://arxiv.org/html/2606.18381#bib.bib1)\); Augensteinet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib2)\)\. As LLMs are increasingly applied to complex tasks involving long documentsJinet al\.\([2025b](https://arxiv.org/html/2606.18381#bib.bib22)\), directly providing entire documents as input becomes impractical due to context\-length constraints and degraded attention over extended sequencesJinet al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib3)\); Liuet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib4)\)\. Consequently, RAG frameworks segment documents into chunks and retrieve the most relevant pieces to construct focused, high\-quality evidence for generation\.
The effectiveness of this retrieval step hinges critically on how documents are segmented\. Large chunks preserve contextual coherence but introduce redundant noise that dilutes key information, while fine\-grained chunks offer precision but suffer from semantic fragmentation and broken inter\-chunk relationshipsTaoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib5)\); Zhaoet al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib23)\)\. This problem is particularly acute for cross\-paragraph retrieval, where answering a query requires synthesizing information scattered across multiple document sections, as in multi\-hop reasoning and summarization tasksLiuet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib24)\)\.
Recent work addresses this challenge from several directions\. SAKI\-RAGTaoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib5)\)uses a SLLMAnet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib12)\)to merge semantically related sentence pairs and relies on an external LLM to filter retrieval candidates\. However, extending this pairwise expansion to multi\-chunk relevance greatly increases the candidate space and makes LLM filtering expensive\. LLM\-guided chunking methods such as Meta\-ChunkingZhaoet al\.\([2025b](https://arxiv.org/html/2606.18381#bib.bib7)\)and MoCZhaoet al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib23)\)improve segmentation quality, but discard cross\-chunk dependencies after chunking\. Hierarchical methods such as RAPTORSarthiet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib9)\)support multi\-granularity retrieval through clustering and summarization, yet clustering treats chunks within a group as interchangeable and summaries can lose evidence\. Graph\-based methods such as GraphRAGEdgeet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib10)\)model entity relations, but are less effective when fine\-grained chunks contain sparse entity information\.
Figure 1:SproutRAGsegments a long document into sentence\-level chunks, uses SLLM attention to identify semantically related sentences, and organizes them into an attention\-guided binary tree\. Retrieval then selects evidence across fine\-grained leaves, mid\-level nodes, and broader subtrees, preserving precision while recovering coherence\.In this paper, we presentSproutRAG, an attention\-guided hierarchical RAG framework that organizes sentence\-level chunks into a learned document structure, preserving cross\-chunk dependencies while avoiding LLM inference overhead during retrieval\. As illustrated in Figure[1](https://arxiv.org/html/2606.18381#S1.F1), this structure enables retrieval at multiple semantic granularities\.SproutRAGencodes documents at sentence granularity using an SLLM and constructs a binary tree bottom\-up, where merge order is determined by a learned weighted aggregation of inter\-sentence attention across transformer heads and layers\. This aggregation replaces naive uniform averaging, which we show introduces a proximity bias that weakens the global tree index; instead, learnable scalar weights allow the model to discover which head types best reflect semantic co\-relevance for document structure\. Each internal node stores a progressive embedding that compositionally represents its subtree, enabling multi\-granularity retrieval via hierarchical beam search that collects candidates across all tree levels\.SproutRAGis trained end\-to\-end to jointly optimize retrieval quality and tree structure, requiring no external LLM calls at any stage\. It captures emergent multi\-sentence relevance that pairwise or flat retrieval methods cannot, while remaining efficient enough for deployment\. We evaluateSproutRAGon four benchmarks spanning scientific, legal, and open\-domain settings\. On average,SproutRAGimproves information efficiency \(IE\) by 6\.1% over the strongest baseline, especially in the cases where evidence is often dispersed across paragraphs\.
In summary, the contributions of this paper are as follows\.\(1\)We introduceSproutRAG, an attention\-guided hierarchical RAG framework that constructs a binary tree over sentence\-level chunks using learned inter\-sentence attention, enabling multi\-granularity retrieval without any LLM calls at inference time\.\(2\)We identify and address the proximity bias introduced by uniform attention averaging in sentence\-level transformers, replacing it with a learned weighted aggregation that allows the model to discover which attention heads best reflect semantic co\-relevance for document structure\.\(3\)We introduce a joint training objective that jointly improves retrieval quality and tree structure, eliminating the need for external LLM filtering or lossy summarization at any pipeline stage\.
## 2Related Work
#### Chunking and adaptive retrieval\.
The effectiveness of RAG depends strongly on how documents are segmented into retrievable units\. Standard RAG pipelines often rely on rule\-based splitters, such as fixed\-length or delimiter\-based chunking, which are efficient but insensitive to semantic boundariesTeam \([2024](https://arxiv.org/html/2606.18381#bib.bib13)\)\. Recent methods aim to improve this granularity choice\. Late\-ChunkingGüntheret al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib6)\)contextualizes token representations before forming chunk embeddings, while Meta\-ChunkingZhaoet al\.\([2025b](https://arxiv.org/html/2606.18381#bib.bib7)\)and MoCZhaoet al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib23)\)use LLM\-based signals or routing mechanisms to produce more adaptive chunk boundaries\. Dense X RetrievalChenet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib8)\)moves toward finer granularity by decomposing text into atomic propositions, improving precision but weakening broader contextual continuity\. Other methods adapt retrieval after chunks have been formed\. ReflectiveRAGVermaet al\.\([2026](https://arxiv.org/html/2606.18381#bib.bib33)\)introduces a self\-reflective retrieval loop that evaluates evidence sufficiency and reformulates queries to improve factual grounding, but it does not change the underlying flat organization of retrieval units\. Most related toSproutRAG, SAKI\-RAGTaoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib5)\)uses a SLLM to estimate inter\-sentence attention and expand retrieved chunks with related sentences\. Unlike SAKI\-RAG’s pairwise expansion with LLM filtering,SproutRAGbuilds a global sentence\-level hierarchy that supports multi\-granularity retrieval without inference\-time LLM calls\.
#### Structured and hierarchical retrieval\.
Beyond flat chunk retrieval, structured RAG methods organize document content into higher\-level representations\. RAPTORSarthiet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib9)\)recursively clusters chunks and summarizes each cluster into a tree, enabling retrieval from multiple levels; however, its structure is based on embedding\-space clustering and relies on LLM\-generated summaries, which can discard fine\-grained details\. Graph\-based approaches such as GraphRAGEdgeet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib10)\)and LightRAGGuoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib11)\)represent documents through entities and relations, supporting traversal\-based retrieval but depending on successful entity extraction and relation construction\. PropRAGWang and Han \([2025](https://arxiv.org/html/2606.18381#bib.bib14)\)replaces entity triples with propositions and performs LLM\-free beam search over proposition paths, while Beam RetrievalZhanget al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib20)\)shows the benefit of maintaining multiple retrieval hypotheses for multi\-hop passage retrieval\. PageIndexZhanget al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib29)\)similarly explores reasoning\-based, vectorless retrieval over document tree structures, but relies on document\-level structural organization rather than learned sentence\-level attention\.SproutRAGinstead builds an attention\-guided binary tree over sentence\-level chunks, with compositional internal nodes and joint retrieval over all nodes\. This preserves cross\-sentence dependencies without lossy summarization, entity\-centric structures, or external LLM calls\.
## 3SproutRAG
As illustrated in Figure[2](https://arxiv.org/html/2606.18381#S3.F2),SproutRAGreplaces flat chunk retrieval with atrained attention\-guided hierarchyover sentence\-level chunks\. During offline indexing, a SLLM encodes the document and provides both sentence embeddings and inter\-sentence attention signals\. These signals are aggregated with learnable head–layer weights and used to build a binary tree, where leaves represent fine\-grained chunks and internal nodes storeprogressive embeddingsof merged sentence groups\. During online retrieval,SproutRAGencodes the query and performshierarchical beam search, collecting candidates from leaves, internal nodes, and subtrees\. As described in Section[3\.3](https://arxiv.org/html/2606.18381#S3.SS3), the framework is trained with a joint objective that improves bothretrieval qualityandtree structure, enabling multi\-granularity retrieval without external LLM calls during retrieval\.
Figure 2:Overview ofSproutRAG\.In the offline indexing phase \(Phase 1\), documents are split into sentence\-level chunks and encoded with a SLLM to obtain sentence embeddings and inter\-sentence attention\. Learned aggregation over attention heads and layers guides bottom\-up tree construction, producing an attention tree with sentence embeddings at the leaves and progressive embeddings at internal nodes\. In the online retrieval phase\(Phase 2\), a query is encoded and hierarchical beam search traverses the tree, collecting candidates from multiple levels before similarity reranking selects the top\-kkchunks for answer generation\.### 3\.1Attention\-Guided Indexing
Given a documentDD, we first split it into sentence\-level chunksS=\{s1,…,sn\}S=\\\{s\_\{1\},\\ldots,s\_\{n\}\\\}\. We encode the full sequence with a SLLM, obtaining contextualized sentence embeddings\{e\(si\)\}i=1n\\\{e\(s\_\{i\}\)\\\}\_\{i=1\}^\{n\}and attention matrices from all layers and heads\. For layerlland headhh, we denote the corresponding attention matrix asAttn\(l,h\)∈ℝn×n\\mathrm\{Attn\}^\{\(l,h\)\}\\in\\mathbb\{R\}^\{n\\times n\}\.
A uniform average over all heads and layers can overemphasize local sentence proximity, since some attention heads primarily capture sequential patternsVoitaet al\.\([2019](https://arxiv.org/html/2606.18381#bib.bib31)\)\. To reduce thisproximity bias,SproutRAGlearns a weighted aggregation over heads and layers:
𝐀ij=∑l=1L∑h=1Hwl,hAttnij\(l,h\),\\mathbf\{A\}\_\{ij\}=\\sum\_\{l=1\}^\{L\}\\sum\_\{h=1\}^\{H\}w\_\{l,h\}\\,\\mathrm\{Attn\}^\{\(l,h\)\}\_\{ij\},\(1\)wherewl,hw\_\{l,h\}is defined as:
wl,h=exp\(αl,h\)∑l′=1L∑h′=1Hexp\(αl′,h′\)\.w\_\{l,h\}=\\frac\{\\exp\(\\alpha\_\{l,h\}\)\}\{\\sum\_\{l^\{\\prime\}=1\}^\{L\}\\sum\_\{h^\{\\prime\}=1\}^\{H\}\\exp\(\\alpha\_\{l^\{\\prime\},h^\{\\prime\}\}\)\}\.\(2\)The learnable scalarsαl,h\\alpha\_\{l,h\}allow the model to emphasize attention heads that better capture semantic co\-relevance\. We then symmetrize the aggregated attention to obtain a mutual relation score:
𝐌ij=𝐀ij\+𝐀ji2\.\\mathbf\{M\}\_\{ij\}=\\frac\{\\mathbf\{A\}\_\{ij\}\+\\mathbf\{A\}\_\{ji\}\}\{2\}\.\(3\)
The tree is built bottom\-up\. Initially, each sentence chunk is a leaf node\. At each step, we merge the pair of active nodes with the highest mutual attention score\. The parent embedding is computed as aprogressive embeddingof its children:
e\(p\)=e\(u\)\+e\(v\)2,e\(p\)=\\frac\{e\(u\)\+e\(v\)\}\{2\},\(4\)whereuuandvvare the merged child nodes\. After merging, the new parent inherits its strongest relation to each remaining node:
𝐌pr=max\(𝐌ur,𝐌vr\)\.\\mathbf\{M\}\_\{pr\}=\\max\(\\mathbf\{M\}\_\{ur\},\\mathbf\{M\}\_\{vr\}\)\.\(5\)This single\-linkage update preserves long\-range semantic connections as the hierarchy grows\. The result is anattention tree𝒯\\mathcal\{T\}whose leaves retain sentence\-level precision and whose internal nodes represent broader semantic units\.
### 3\.2Hierarchical Retrieval
Given a queryqq,SproutRAGfirst encodes it with the same SLLM used during indexing to obtain a query embeddinge\(q\)e\(q\)\. Retrieval is then performed over the attention\-guided binary tree, where each nodevvrepresents a document span at a specific granularity\. Leaf nodes correspond to sentence\-level chunks, while internal nodes represent progressively larger groups of semantically related sentences\. This allowsSproutRAGto retrieve evidence at the level most appropriate for the query, rather than relying on a fixed chunk size\.
Each candidate node is scored by cosine similarity between the query embedding and the node representation:
sim\(q,v\)=e\(q\)⊤e\(v\)‖e\(q\)‖‖e\(v\)‖\.\\mathrm\{sim\}\(q,v\)=\\frac\{e\(q\)^\{\\top\}e\(v\)\}\{\\\|e\(q\)\\\|\\,\\\|e\(v\)\\\|\}\.\(6\)
Starting from the root node, retrieval proceeds via hierarchical beam search\. Letℬt\\mathcal\{B\}\_\{t\}denote the active beam at depthtt, withℬ0=\{vroot\}\\mathcal\{B\}\_\{0\}=\\\{v\_\{\\mathrm\{root\}\}\\\}\. At each step,SproutRAGexpands the children of the current beam nodes and retains the top\-bbmost relevant nodes:
ℬt\+1=Topb\(⋃v∈ℬtChild\(v\),sim\(q,⋅\)\),\\mathcal\{B\}\_\{t\+1\}=\\operatorname\{Top\}\_\{b\}\\left\(\\bigcup\_\{v\\in\\mathcal\{B\}\_\{t\}\}\\mathrm\{Child\}\(v\),\\mathrm\{sim\}\(q,\\cdot\)\\right\),\(7\)wherebbis the beam width\. This search strategy focuses computation on the most promising branches of the tree while still allowing the retriever to explore multiple semantically relevant regions of the document\.
In parallel,SproutRAGcollects relevant nodes encountered during traversal\. Let𝒱visit\\mathcal\{V\}\_\{\\mathrm\{visit\}\}denote the set of all nodes scored during beam search:
𝒱visit=⋃t⋃v∈ℬtChild\(v\)\.\\mathcal\{V\}\_\{\\mathrm\{visit\}\}=\\bigcup\_\{t\}\\bigcup\_\{v\\in\\mathcal\{B\}\_\{t\}\}\\mathrm\{Child\}\(v\)\.\(8\)The retrieval candidate set is then defined as all visited nodes whose similarity exceeds alearned thresholdδ\\delta:
𝒞=\{v∈𝒱visit:sim\(q,v\)≥δ\}\.\\mathcal\{C\}=\\left\\\{v\\in\\mathcal\{V\}\_\{\\mathrm\{visit\}\}\\;:\\;\\mathrm\{sim\}\(q,v\)\\geq\\delta\\right\\\}\.\(9\)
The candidate set𝒞\\mathcal\{C\}contains evidence at multiple granularities, from sentence\-level leaves to larger subtrees, allowingSproutRAGto retrieve either precise facts or broader multi\-sentence context as needed\. The collected candidates are reranked by similarity or a lightweight reranker, and the top\-kkchunks are passed to the answer generator\. The complete indexing and retrieval procedure is summarized in Algorithm[1](https://arxiv.org/html/2606.18381#alg1)\.
Algorithm 1SproutRAGIndexing and Retrieval0:Document
DD, query
qq, beam width
bb, threshold
δ\\delta, top\-
kk
0:Retrieved evidence
FF
1:\{Offline indexing\}
2:
S←SplitSentences\(D\)S\\leftarrow\\mathrm\{SplitSentences\}\(D\)
3:
E,𝒜←SLLM\(S\)E,\\mathcal\{A\}\\leftarrow\\mathrm\{SLLM\}\(S\)
4:
𝐀←∑l,hwl,h𝒜\(l,h\)\\mathbf\{A\}\\leftarrow\\sum\_\{l,h\}w\_\{l,h\}\\,\\mathcal\{A\}^\{\(l,h\)\}
5:
𝐌←\(𝐀\+𝐀⊤\)/2\\mathbf\{M\}\\leftarrow\(\\mathbf\{A\}\+\\mathbf\{A\}^\{\\top\}\)/2
6:
𝒩←\{Leaf\(si,ei\)∣si∈S,ei∈E\}\\mathcal\{N\}\\leftarrow\\\{\\mathrm\{Leaf\}\(s\_\{i\},e\_\{i\}\)\\mid s\_\{i\}\\in S,e\_\{i\}\\in E\\\}
7:while
\|𝒩\|\>1\|\\mathcal\{N\}\|\>1do
8:
\(u,v\)←argmaxu≠v;u,v∈𝒩𝐌uv\(u,v\)\\leftarrow\\arg\\max\_\{u\\neq v;\\,u,v\\in\\mathcal\{N\}\}\\mathbf\{M\}\_\{uv\}
9:
p←Node\(u,v\)p\\leftarrow\\mathrm\{Node\}\(u,v\)
10:
ep←\(eu\+ev\)/2e\_\{p\}\\leftarrow\(e\_\{u\}\+e\_\{v\}\)/2
11:for
r∈𝒩∖\{u,v\}r\\in\\mathcal\{N\}\\setminus\\\{u,v\\\}do
12:
𝐌pr←max\(𝐌ur,𝐌vr\)\\mathbf\{M\}\_\{pr\}\\leftarrow\\max\(\\mathbf\{M\}\_\{ur\},\\mathbf\{M\}\_\{vr\}\)
13:
𝐌rp←𝐌pr\\mathbf\{M\}\_\{rp\}\\leftarrow\\mathbf\{M\}\_\{pr\}
14:endfor
15:
𝒩←\(𝒩∖\{u,v\}\)∪\{p\}\\mathcal\{N\}\\leftarrow\(\\mathcal\{N\}\\setminus\\\{u,v\\\}\)\\cup\\\{p\\\}
16:endwhile
17:
𝒯←root\(𝒩\)\\mathcal\{T\}\\leftarrow\\mathrm\{root\}\(\\mathcal\{N\}\)
18:\{Online retrieval\}
19:
eq←SLLM\(q\)e\_\{q\}\\leftarrow\\mathrm\{SLLM\}\(q\)
20:
𝒞←∅,ℬ←\{𝒯\}\\mathcal\{C\}\\leftarrow\\emptyset,\\quad\\mathcal\{B\}\\leftarrow\\\{\\mathcal\{T\}\\\}
21:while
ℬ≠∅\\mathcal\{B\}\\neq\\emptysetdo
22:
𝒞←𝒞∪\{v∈ℬ∣sim\(eq,ev\)≥δ\}\\mathcal\{C\}\\leftarrow\\mathcal\{C\}\\cup\\\{v\\in\\mathcal\{B\}\\mid\\mathrm\{sim\}\(e\_\{q\},e\_\{v\}\)\\geq\\delta\\\}
23:
𝒳←⋃v∈ℬChildren\(v\)\\mathcal\{X\}\\leftarrow\\bigcup\_\{v\\in\\mathcal\{B\}\}\\mathrm\{Children\}\(v\)
24:
ℬ←TopB\(𝒳,b,sim\(eq,⋅\)\)\\mathcal\{B\}\\leftarrow\\mathrm\{TopB\}\(\\mathcal\{X\},b,\\mathrm\{sim\}\(e\_\{q\},\\cdot\)\)
25:endwhile
26:
F←TopK\(Rerank\(𝒞,q\),k\)F\\leftarrow\\mathrm\{TopK\}\(\\mathrm\{Rerank\}\(\\mathcal\{C\},q\),k\)
27:return
FF
### 3\.3Joint Training
The pretrained SLLM is not optimized for retrieval or for constructing retrieval\-oriented document structures\. We therefore fine\-tuneSproutRAGwith a joint objective that improves both the embedding space and the attention tree\.
#### Retrieval objective\.
We train the SLLM embeddings with contrastive learning over query–passage pairs\. Given a queryqq, a positive passagep\+p^\{\+\}, and hard negatives\{pj\}\\\{p\_\{j\}\\\}, each passage is represented by mean\-pooling its sentence embeddings\. We optimize:
ℒret=−logexp\(sim\(q,p\+\)/τ\)∑jexp\(sim\(q,pj\)/τ\),\\mathcal\{L\}\_\{\\mathrm\{ret\}\}=\-\\log\\frac\{\\exp\(\\mathrm\{sim\}\(q,p^\{\+\}\)/\\tau\)\}\{\\sum\_\{j\}\\exp\(\\mathrm\{sim\}\(q,p\_\{j\}\)/\\tau\)\},\(10\)whereτ\\tauis a temperature parameter\. This objective aligns queries with relevant passages and separates them from hard negatives\.
#### Structure objective\.
Good embeddings alone do not guarantee a useful hierarchy\. Since the tree depends on the learned attention matrix, we add an attention regularizer that encourages co\-relevant sentence pairs to receive high mutual attention\. Let𝒢\\mathcal\{G\}be the set of sentence pairs within a positive passage that jointly support the query\. We define:
ℒattn=−1\|𝒢\|∑\(si,sj\)∈𝒢log\(𝐀ij\+𝐀ji2\)\.\\mathcal\{L\}\_\{\\mathrm\{attn\}\}=\-\\frac\{1\}\{\|\\mathcal\{G\}\|\}\\sum\_\{\(s\_\{i\},s\_\{j\}\)\\in\\mathcal\{G\}\}\\log\\left\(\\frac\{\\mathbf\{A\}\_\{ij\}\+\\mathbf\{A\}\_\{ji\}\}\{2\}\\right\)\.\(11\)This objective directly shapes the learned head–layer aggregation, encouraging the induced tree to group semantically related evidence into coherent and retrievable subtrees\.
#### Final objective\.
The final training loss is:
ℒ=ℒret\+λℒattn,\\mathcal\{L\}=\\mathcal\{L\}\_\{\\mathrm\{ret\}\}\+\\lambda\\mathcal\{L\}\_\{\\mathrm\{attn\}\},\(12\)whereλ\\lambdacontrols the strength of structure regularization\. After training, the learned aggregation weights are used during offline indexing, and retrieval requires only query encoding, tree traversal, and reranking\. Thus,SproutRAGavoids external LLM filtering and lossy LLM\-based summarization while enabling efficient multi\-granularity retrieval\.
## 4Experiments and Results
### 4\.1Experimental Setup
Benchmarks\.We evaluateSproutRAGon four retrieval benchmarks spanning scientific, legal, and open\-domain settings: SCI\-DOCSCohanet al\.\([2020](https://arxiv.org/html/2606.18381#bib.bib25)\), LegalBench\-RAGPipitone and Alami \([2024](https://arxiv.org/html/2606.18381#bib.bib27)\), DragonballZhuet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib28)\), and MS MARCONguyenet al\.\([2016](https://arxiv.org/html/2606.18381#bib.bib21)\)\. For end\-to\-end answer generation, we further evaluate on HotpotQAYanget al\.\([2018](https://arxiv.org/html/2606.18381#bib.bib34)\), WebQuestionsBerantet al\.\([2013](https://arxiv.org/html/2606.18381#bib.bib35)\), and Dragonball\. See Appendix[A\.1](https://arxiv.org/html/2606.18381#A1.SS1)for more details\.
Baselines\.We compare against representative chunking and structured retrieval methods, including Dense X RetrievalChenet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib8)\), Meta\-ChunkingZhaoet al\.\([2025b](https://arxiv.org/html/2606.18381#bib.bib7)\), MoCZhaoet al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib23)\), RAPTORSarthiet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib9)\), LightRAGGuoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib11)\), PropRAGWang and Han \([2025](https://arxiv.org/html/2606.18381#bib.bib14)\), and SAKI\-RAGTaoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib5)\)\. GraphRAGEdgeet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib10)\), ReflectiveRAGVermaet al\.\([2026](https://arxiv.org/html/2606.18381#bib.bib33)\), PageIndexZhanget al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib29)\), and REFRAGLinet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib30)\)are reported only for final task performance, as they primarily involve LLM\-heavy reasoning, generation, summarization, or decoding\-time optimization rather than efficient retrieval\. For fair comparison, methods requiring an LLM or reranker use the sameQwen3\-8BTeam \([2025](https://arxiv.org/html/2606.18381#bib.bib36)\)generator andQwen3\-Reranker\-4BZhanget al\.\([2025b](https://arxiv.org/html/2606.18381#bib.bib37)\)reranker\. Other than unifying the generator and reranker, we follow the settings recommended in the original papers for each baseline\. See Appendix[A\.2](https://arxiv.org/html/2606.18381#A1.SS2)for more details\.
Metrics\.For retrieval evaluation, we report Recall, Precision, and Information Efficiency \(IE\), whereIE=Recall×Precision\\mathrm\{IE\}=\\mathrm\{Recall\}\\times\\mathrm\{Precision\}\. To keep the evidence budget comparable across methods,kkdenotes the number of underlying evidence units used for evaluation rather than the number of retrieved tree nodes\. For hierarchical outputs, retrieved internal nodes are expanded into their underlying evidence units, with each contained unit counted toward the same top\-kkbudget\. We compute each metric atk∈\{1,3,5\}k\\in\\\{1,3,5\\\}and report the average across these three cutoffs\. For end\-to\-end generation, we report F1 on HotpotQA and WebQuestions, and ROUGE\-LLin \([2004](https://arxiv.org/html/2606.18381#bib.bib38)\), METEORBanerjee and Lavie \([2005](https://arxiv.org/html/2606.18381#bib.bib39)\), and BERTScoreZhang\*et al\.\([2020](https://arxiv.org/html/2606.18381#bib.bib40)\)on Dragonball\. We also report online efficiency using Tok/Q and latency, where Tok/Q counts online model\-token usage per query, excluding offline indexing and output tokens\. All reported results are averaged over three independent runs\.
Implementation and Training Details\.We use the 1\.3B\-parameter SLLMAnet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib12)\)222[https://github\.com/cavedweller509/SentenceVAE](https://github.com/cavedweller509/SentenceVAE)as the sentence encoder and split documents into chunks of up to two sentences\.SproutRAGis trained on 30K query–passage examples sampled from CLaRaHeet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib32)\), fine\-tuning the SLLM and learned head–layer aggregation weights with the joint objective in Eq\.[12](https://arxiv.org/html/2606.18381#S3.E12)\. Unless otherwise stated, we use the following hyperparameters as the default setting: 3 training epochs with AdamW, learning rates of2×10−52\\times 10^\{\-5\}for the SLLM and1×10−31\\times 10^\{\-3\}for the aggregation scalars, batch size 32, temperatureτ=0\.05\\tau=0\.05, attention weightλ=0\.1\\lambda=0\.1, and 5% linear warmup\. At retrieval time, the default setting uses beam widthb=5b=5, collects candidates from all tree levels, and reranks them withQwen3\-Reranker\-4B\. Final answers are generated withQwen3\-8B\. For fair comparison, baselines that require an LLM or reranker use the same models; otherwise, we follow their original settings\. All experiments use 8 NVIDIA A100 80GB GPUs, and results are averaged over three runs\.
Table 1:Retrieval performance across four benchmarks\. Recall, Precision, and IE are averaged over @1, @3, and @5, with IE computed at each cutoff before averaging\. Values report the mean over three independent runs, and the red±\{\\color\[rgb\]\{0\.7,0,0\}\\definecolor\[named\]\{pgfstrokecolor\}\{rgb\}\{0\.7,0,0\}\\pm\}values indicate the corresponding standard deviation\. The shaded row marksSproutRAG\. Refer to Appendix[B](https://arxiv.org/html/2606.18381#A2)for the results @1, @3, and @5\.Table 2:End\-to\-end answer quality and online efficiency\. HotpotQA and WebQuestions are evaluated with F1, while Dragonball uses ROUGE\-L \(R\-L\), METEOR \(MTR\), and BERTScore \(BRT\)\. Tok/Q counts online model input tokens per query, excluding offline training, indexing, and output tokens; Lat\. reports online per\-query latency\. All methods use the same generator and reranker when applicable\.
### 4\.2Retrieval Quality
Table[1](https://arxiv.org/html/2606.18381#S4.T1)reports retrieval performance across four benchmarks\.SproutRAGachieves the highest IE on all datasets, improving over the strongest baseline by8\.06points on Dragonball,4\.65on SCI\-DOCS,4\.90on LegalBench\-RAG, and6\.83on MS MARCO\. These improvements are not driven by recall alone:SproutRAGalso obtains thebest precision on every benchmark\. This suggests that the attention\-guided hierarchy helps retrieve broader supporting context while avoiding the noise introduced by overly large or weakly related chunks\.
The comparison with SAKI\-RAG is especially informative\. While SAKI\-RAG achieves strong precision, particularly on Dragonball and SCI\-DOCS, its pairwise expansion limits evidence aggregation, reducing recall and IE\. In contrast,SproutRAGconverts sentence\-level attention into aglobal tree structure, enabling retrieval over individual chunks, internal nodes, and subtrees\. This preserves SAKI\-RAG’s precision benefits while improving IE across all datasets\. Structured baselines such as RAPTOR, LightRAG, PropRAG, and MoC improve recall over flat or boundary\-based chunking, but their clustering, graph, proposition, or routing structures do not explicitly model learned multi\-sentence composition\.SproutRAGbridges this gap: leaves retain fine\-grained evidence, while internal nodes recover coherent context, yielding the strongest recall–precision tradeoff\. Appendix[C](https://arxiv.org/html/2606.18381#A3)provides a qualitative example\.
### 4\.3End\-to\-End Performance and Efficiency
We next evaluate whether the retrieval improvements translate into stronger final answers\. Table[2](https://arxiv.org/html/2606.18381#S4.T2)comparesSproutRAGwith system\-level RAG methods on HotpotQA, WebQuestions, and Dragonball\. PageIndex achieves the highest final answer scores, but it requires substantially more online computation due to its reasoning\-based search and evidence construction\. REFRAG improves efficiency compared with reflection\-heavy or reasoning\-heavy systems, butSproutRAGstill provides the strongestperformance–efficiency tradeoff: it outperforms GraphRAG, ReflectiveRAG, and REFRAG across all final\-performance metrics, while using only4\.38K online tokens per queryand193 mslatency\. The reported cost measures online per\-query inference and excludes offline training and indexing\.SproutRAGdoes require an upfront training stage: we fine\-tune the SLLM and attention aggregation weights on a 30K\-example subset of CLaRa\. However,this cost is paid once and reused across datasets, similar to other systems with offline preparation or model adaptation costs\. The cross\-dataset results show that the learned attention\-guided hierarchy generalizes without retraining for each benchmark, making the training cost amortized rather than query\-time overhead\.
Table 3:Ablation study on retrieval performance\. Metrics are averaged over @1, @3, and @5\. The blue row is the defaultSproutRAGsetting \(b=5,λ=0\.1b=5,\\lambda=0\.1\)\. The three groups evaluate training objectives, tree/retrieval design, and sensitivity tobbandλ\\lambda\.### 4\.4Ablation Study
Training objectives\.In thefirst groupof Table[4\.3](https://arxiv.org/html/2606.18381#S4.SS3), evaluates the role of the training objectives\. TheNot trainedvariant performs worst across all datasets, showing that the pretrained SLLM attention and embeddings are not sufficient for retrieval\-oriented tree construction\. Removingℒret\\mathcal\{L\}\_\{\\mathrm\{ret\}\}substantially reduces both recall and IE, since the query and evidence embeddings are no longer explicitly aligned\. Removingℒattn\\mathcal\{L\}\_\{\\mathrm\{attn\}\}is less damaging than removing the retrieval loss, but still causes a consistent drop, especially in IE\. This confirms thatembedding quality and tree quality require complementary supervision: the retrieval loss aligns query–passage representations, while the attention\-structure loss shapes the hierarchy used for multi\-granularity retrieval\.
Tree and retrieval design\.Thesecond groupin Table[4\.3](https://arxiv.org/html/2606.18381#S4.SS3)examines the necessity of the attention\-guided hierarchy\. Uniform attention aggregation reduces performance, highlighting that averaging heads and layers introducesproximity biasand weakens the tree\. An embedding\-similarity tree also underperforms, showing that SLLM attention encodes structural information beyond embeddings\. Leaf\-only retrieval maintains high precision but lowers recall and IE, while greedy search suffers from early path commitment\. These results demonstrate thatlearned attention aggregation,internal\-node retrieval, andbeam searchare all crucial for balancing precise evidence with broader contextual coverage\.
Hyperparameter sensitivity\.Thefinal groupin Table[4\.3](https://arxiv.org/html/2606.18381#S4.SS3)studies beam widthbband attention regularization weightλ\\lambda\. Reducing the beam width tob=3b=3slightly lowers IE because fewer semantic paths are explored, while increasing it tob=10b=10improves recall but slightly reduces precision, yielding no overall advantage over the defaultb=5b=5\. Similarly, bothλ=0\.05\\lambda=0\.05andλ=0\.20\\lambda=0\.20underperform the defaultλ=0\.1\\lambda=0\.1: a weaker structure loss provides insufficient guidance for tree construction, while a stronger one can overemphasize attention alignment at the expense of retrieval precision\. Overall,SproutRAGis stable across reasonable settings, withb=5b=5andλ=0\.1\\lambda=0\.1providing the best recall–precision tradeoff\.
## 5Conclusion and Future Work
We introducedSproutRAG, an attention\-guided hierarchical RAG framework that organizes sentence\-level chunks into a learned tree for multi\-granularity retrieval\. Rather than relying on fixed chunk boundaries, pairwise context expansion, lossy summarization, or inference\-time LLM filtering,SproutRAGuses learned SLLM attention aggregation to construct a retrieval\-oriented hierarchy\. At inference time, hierarchical beam search selects evidence from sentence leaves, internal nodes, and broader subtrees, allowing the retriever to balance fine\-grained precision with contextual coherence\. Across benchmarks,SproutRAGimproves retrieval information efficiency by 6\.1% on average, offering a strongperformance–efficiency tradeoffthat approaches LLM\-heavy systems while using far fewer online tokens and lower latency\. WhileSproutRAGgeneralizes well after one\-time training, several directions remain open\. Future work can explore richer node composition functions beyond mean pooling, such as gated or attention\-based composition, and dynamic tree adaptation or query\-dependent traversal policies for complex multi\-hop retrieval\.
## Limitations
WhileSproutRAGimproves multi\-granularity retrieval without inference\-time LLM filtering, it has some limitations\. First, the hierarchy is currently built as abinary tree, which may be restrictive when several sentences jointly form a coherent semantic unit and should be grouped together simultaneously\. Multi\-branch trees could better capture such many\-to\-many dependencies\. Second,SproutRAGrequires an upfront training stage for the SLLM and attention aggregation weights\. Although this is a one\-time cost that transfers across datasets in our experiments, it is still more expensive than using an off\-the\-shelf retriever without adaptation\. Finally, tree construction is offline and fixed during retrieval\. While this makes inference efficient and avoids rebuilding the index per query, it may be less flexible when queries require evidence reorganized by query\-specific relevance\.
## References
- H\. An, Y\. Chen, Z\. Sun, and X\. Li \(2024\)SentenceVAE: enable next\-sentence prediction for large language models with faster speed, higher accuracy and longer context\.External Links:2408\.00655,[Link](https://arxiv.org/abs/2408.00655)Cited by:[§1](https://arxiv.org/html/2606.18381#S1.p3.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p4.5)\.
- I\. Augenstein, T\. Baldwin, M\. Cha, T\. Chakraborty, G\. L\. Ciampaglia, D\. Corney, R\. DiResta, E\. Ferrara, S\. Hale, A\. Halevy,et al\.\(2024\)Factuality challenges in the era of large language models and opportunities for fact\-checking\.Nature Machine Intelligence6\(8\),pp\. 852–863\.External Links:[Link](https://doi.org/10.1038/s42256-024-00881-z)Cited by:[§1](https://arxiv.org/html/2606.18381#S1.p1.1)\.
- S\. Banerjee and A\. Lavie \(2005\)METEOR: an automatic metric for MT evaluation with improved correlation with human judgments\.InProceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization,J\. Goldstein, A\. Lavie, C\. Lin, and C\. Voss \(Eds\.\),Ann Arbor, Michigan,pp\. 65–72\.External Links:[Link](https://aclanthology.org/W05-0909/)Cited by:[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p3.4)\.
- J\. Berant, A\. Chou, R\. Frostig, and P\. Liang \(2013\)Semantic parsing on Freebase from question\-answer pairs\.InProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,Seattle, Washington, USA,pp\. 1533–1544\.External Links:[Link](https://www.aclweb.org/anthology/D13-1160)Cited by:[§A\.1](https://arxiv.org/html/2606.18381#A1.SS1.SSS0.Px6),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p1.1)\.
- T\. Chen, H\. Wang, S\. Chen, W\. Yu, K\. Ma, X\. Zhao, H\. Zhang, and D\. Yu \(2024\)Dense X retrieval: what retrieval granularity should we use?\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 15159–15177\.External Links:[Link](https://aclanthology.org/2024.emnlp-main.845/),[Document](https://dx.doi.org/10.18653/v1/2024.emnlp-main.845)Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px1),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- A\. Cohan, S\. Feldman, I\. Beltagy, D\. Downey, and D\. Weld \(2020\)SPECTER: document\-level representation learning using citation\-informed transformers\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 2270–2282\.External Links:[Link](https://aclanthology.org/2020.acl-main.207/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.207)Cited by:[§A\.1](https://arxiv.org/html/2606.18381#A1.SS1.SSS0.Px1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p1.1)\.
- D\. Edge, H\. Trinh, N\. Cheng, J\. Bradley, A\. Chao, A\. Mody, S\. Truitt, D\. Metropolitansky, R\. O\. Ness, and J\. Larson \(2025\)From local to global: a graph rag approach to query\-focused summarization\.External Links:2404\.16130,[Link](https://arxiv.org/abs/2404.16130)Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px8),[§1](https://arxiv.org/html/2606.18381#S1.p3.1),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- M\. Günther, I\. Mohr, D\. J\. Williams, B\. Wang, and H\. Xiao \(2025\)Late chunking: contextual chunk embeddings using long\-context embedding models\.External Links:2409\.04701,[Link](https://arxiv.org/abs/2409.04701)Cited by:[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px1.p1.1)\.
- Z\. Guo, L\. Xia, Y\. Yu, T\. Ao, and C\. Huang \(2025\)LightRAG: simple and fast retrieval\-augmented generation\.InFindings of the Association for Computational Linguistics: EMNLP 2025,C\. Christodoulopoulos, T\. Chakraborty, C\. Rose, and V\. Peng \(Eds\.\),Suzhou, China,pp\. 10746–10761\.External Links:[Link](https://aclanthology.org/2025.findings-emnlp.568/),[Document](https://dx.doi.org/10.18653/v1/2025.findings-emnlp.568),ISBN 979\-8\-89176\-335\-7Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px5),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- J\. He, R\. H\. Bai, S\. Williamson, J\. Z\. Pan, N\. Jaitly, and Y\. Zhang \(2025\)CLaRa: bridging retrieval and generation with continuous latent reasoning\.External Links:2511\.18659,[Link](https://arxiv.org/abs/2511.18659)Cited by:[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p4.5)\.
- B\. Jin, J\. Yoon, J\. Han, and S\. O\. Arik \(2025a\)Long\-context LLMs meet RAG: overcoming challenges for long inputs in RAG\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=oU3tpaR8fm)Cited by:[§1](https://arxiv.org/html/2606.18381#S1.p1.1)\.
- B\. Jin, J\. Yoon, J\. Han, and S\. O\. Arik \(2025b\)Long\-context LLMs meet RAG: overcoming challenges for long inputs in RAG\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=oU3tpaR8fm)Cited by:[§1](https://arxiv.org/html/2606.18381#S1.p1.1)\.
- P\. Lewis, E\. Perez, A\. Piktus, F\. Petroni, V\. Karpukhin, N\. Goyal, H\. Küttler, M\. Lewis, W\. Yih, T\. Rocktäschel, S\. Riedel, and D\. Kiela \(2020\)Retrieval\-augmented generation for knowledge\-intensive nlp tasks\.InAdvances in Neural Information Processing Systems,H\. Larochelle, M\. Ranzato, R\. Hadsell, M\.F\. Balcan, and H\. Lin \(Eds\.\),Vol\.33,pp\. 9459–9474\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf)Cited by:[§1](https://arxiv.org/html/2606.18381#S1.p1.1)\.
- C\. Lin \(2004\)ROUGE: a package for automatic evaluation of summaries\.InText Summarization Branches Out,Barcelona, Spain,pp\. 74–81\.External Links:[Link](https://aclanthology.org/W04-1013/)Cited by:[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p3.4)\.
- X\. Lin, A\. Ghosh, B\. K\. H\. Low, A\. Shrivastava, and V\. Mohan \(2025\)REFRAG: rethinking rag based decoding\.External Links:2509\.01092,[Link](https://arxiv.org/abs/2509.01092)Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px11),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- H\. Liu, Z\. Wang, X\. Chen, Z\. Li, F\. Xiong, Q\. Yu, and W\. Zhang \(2025\)HopRAG: multi\-hop reasoning for logic\-aware retrieval\-augmented generation\.InFindings of the Association for Computational Linguistics: ACL 2025,W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria,pp\. 1897–1913\.External Links:[Link](https://aclanthology.org/2025.findings-acl.97/),[Document](https://dx.doi.org/10.18653/v1/2025.findings-acl.97),ISBN 979\-8\-89176\-256\-5Cited by:[§1](https://arxiv.org/html/2606.18381#S1.p2.1)\.
- N\. F\. Liu, K\. Lin, J\. Hewitt, A\. Paranjape, M\. Bevilacqua, F\. Petroni, and P\. Liang \(2024\)Lost in the middle: how language models use long contexts\.Transactions of the Association for Computational Linguistics12,pp\. 157–173\.External Links:[Link](https://aclanthology.org/2024.tacl-1.9/),[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00638)Cited by:[§1](https://arxiv.org/html/2606.18381#S1.p1.1)\.
- T\. Nguyen, M\. Rosenberg, X\. Song, J\. Gao, S\. Tiwary, R\. Majumder, and L\. Deng \(2016\)MS MARCO: A human generated machine reading comprehension dataset\.CoRRabs/1611\.09268\.External Links:[Link](http://arxiv.org/abs/1611.09268),1611\.09268Cited by:[§A\.1](https://arxiv.org/html/2606.18381#A1.SS1.SSS0.Px4),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p1.1)\.
- N\. Pipitone and G\. H\. Alami \(2024\)LegalBench\-rag: a benchmark for retrieval\-augmented generation in the legal domain\.External Links:2408\.10343,[Link](https://arxiv.org/abs/2408.10343)Cited by:[§A\.1](https://arxiv.org/html/2606.18381#A1.SS1.SSS0.Px2),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p1.1)\.
- P\. Sarthi, S\. Abdullah, A\. Tuli, S\. Khanna, A\. Goldie, and C\. D\. Manning \(2024\)RAPTOR: recursive abstractive processing for tree\-organized retrieval\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=GN921JHCRw)Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px4),[§1](https://arxiv.org/html/2606.18381#S1.p3.1),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- W\. Tao, X\. Xing, Z\. Li, and X\. Xu \(2025\)SAKI\-RAG: mitigating context fragmentation in long\-document RAG via sentence\-level attention knowledge integration\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,C\. Christodoulopoulos, T\. Chakraborty, C\. Rose, and V\. Peng \(Eds\.\),Suzhou, China,pp\. 1195–1213\.External Links:[Link](https://aclanthology.org/2025.emnlp-main.63/),[Document](https://dx.doi.org/10.18653/v1/2025.emnlp-main.63),ISBN 979\-8\-89176\-332\-6Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px7),[§1](https://arxiv.org/html/2606.18381#S1.p2.1),[§1](https://arxiv.org/html/2606.18381#S1.p3.1),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- L\. Team \(2024\)LangChain: a framework for developing applications powered by language models\.External Links:LinkCited by:[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px1.p1.1)\.
- Q\. Team \(2025\)Qwen3 technical report\.External Links:2505\.09388,[Link](https://arxiv.org/abs/2505.09388)Cited by:[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- A\. Verma, S\. Gupta, S\. Pillai, P\. Sircar, and D\. Gupta \(2026\)ReflectiveRAG: rethinking adaptivity in retrieval\-augmented generation\.InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 5: Industry Track\),Y\. Matusevych, G\. Eryiğit, and N\. Aletras \(Eds\.\),Rabat, Morocco,pp\. 377–384\.External Links:[Link](https://aclanthology.org/2026.eacl-industry.27/),[Document](https://dx.doi.org/10.18653/v1/2026.eacl-industry.27),ISBN 979\-8\-89176\-384\-5Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px9),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- E\. Voita, D\. Talbot, F\. Moiseev, R\. Sennrich, and I\. Titov \(2019\)Analyzing multi\-head self\-attention: specialized heads do the heavy lifting, the rest can be pruned\.InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics,A\. Korhonen, D\. Traum, and L\. Màrquez \(Eds\.\),Florence, Italy,pp\. 5797–5808\.External Links:[Link](https://aclanthology.org/P19-1580/),[Document](https://dx.doi.org/10.18653/v1/P19-1580)Cited by:[§3\.1](https://arxiv.org/html/2606.18381#S3.SS1.p2.3)\.
- J\. Wang and J\. Han \(2025\)PropRAG: guiding retrieval with beam search over proposition paths\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,C\. Christodoulopoulos, T\. Chakraborty, C\. Rose, and V\. Peng \(Eds\.\),Suzhou, China,pp\. 6212–6227\.External Links:[Link](https://aclanthology.org/2025.emnlp-main.317/),[Document](https://dx.doi.org/10.18653/v1/2025.emnlp-main.317),ISBN 979\-8\-89176\-332\-6Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px6),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- Z\. Yang, P\. Qi, S\. Zhang, Y\. Bengio, W\. Cohen, R\. Salakhutdinov, and C\. D\. Manning \(2018\)HotpotQA: a dataset for diverse, explainable multi\-hop question answering\.InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,E\. Riloff, D\. Chiang, J\. Hockenmaier, and J\. Tsujii \(Eds\.\),Brussels, Belgium,pp\. 2369–2380\.External Links:[Link](https://aclanthology.org/D18-1259/),[Document](https://dx.doi.org/10.18653/v1/D18-1259)Cited by:[§A\.1](https://arxiv.org/html/2606.18381#A1.SS1.SSS0.Px5),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p1.1)\.
- J\. Zhang, H\. Zhang, D\. Zhang, L\. Yong, and S\. Huang \(2024\)End\-to\-end beam retrieval for multi\-hop question answering\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 1718–1731\.External Links:[Link](https://aclanthology.org/2024.naacl-long.96/),[Document](https://dx.doi.org/10.18653/v1/2024.naacl-long.96)Cited by:[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px2.p1.1)\.
- M\. Zhang, Y\. Tang, and P\. Team \(2025a\)PageIndex: next\-generation vectorless, reasoning\-based rag\.PageIndex Blog\.External Links:[Link](https://pageindex.ai/blog/pageindex-intro)Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px10),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- Y\. Zhang, M\. Li, D\. Long, X\. Zhang, H\. Lin, B\. Yang, P\. Xie, A\. Yang, D\. Liu, J\. Lin, F\. Huang, and J\. Zhou \(2025b\)Qwen3 embedding: advancing text embedding and reranking through foundation models\.External Links:2506\.05176,[Link](https://arxiv.org/abs/2506.05176)Cited by:[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- T\. Zhang\*, V\. Kishore\*, F\. Wu\*, K\. Q\. Weinberger, and Y\. Artzi \(2020\)BERTScore: evaluating text generation with bert\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=SkeHuCVFDr)Cited by:[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p3.4)\.
- J\. Zhao, Z\. Ji, Z\. Fan, H\. Wang, S\. Niu, B\. Tang, F\. Xiong, and Z\. Li \(2025a\)MoC: mixtures of text chunking learners for retrieval\-augmented generation system\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria,pp\. 5172–5189\.External Links:[Link](https://aclanthology.org/2025.acl-long.258/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.258),ISBN 979\-8\-89176\-251\-0Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px3),[§1](https://arxiv.org/html/2606.18381#S1.p2.1),[§1](https://arxiv.org/html/2606.18381#S1.p3.1),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- J\. Zhao, Z\. Ji, Y\. Feng, P\. Qi, S\. Niu, B\. Tang, F\. Xiong, and Z\. Li \(2025b\)Meta\-chunking: learning text segmentation and semantic completion via logical perception\.External Links:2410\.12788,[Link](https://arxiv.org/abs/2410.12788)Cited by:[§A\.2](https://arxiv.org/html/2606.18381#A1.SS2.SSS0.Px2),[§1](https://arxiv.org/html/2606.18381#S1.p3.1),[§2](https://arxiv.org/html/2606.18381#S2.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p2.1)\.
- K\. Zhu, Y\. Luo, D\. Xu, Y\. Yan, Z\. Liu, S\. Yu, R\. Wang, S\. Wang, Y\. Li, N\. Zhang, X\. Han, Z\. Liu, and M\. Sun \(2025\)RAGEval: scenario specific RAG evaluation dataset generation framework\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria,pp\. 8520–8544\.External Links:[Link](https://aclanthology.org/2025.acl-long.418/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.418),ISBN 979\-8\-89176\-251\-0Cited by:[§A\.1](https://arxiv.org/html/2606.18381#A1.SS1.SSS0.Px3),[§4\.1](https://arxiv.org/html/2606.18381#S4.SS1.p1.1)\.
## Appendix ABenchmark and Baseline Details
### A\.1Benchmark Dataset Details
We include four retrieval\-focused benchmarks—SCI\-DOCS, LegalBench\-RAG, Dragonball, and MS MARCO—and three end\-to\-end generation benchmarks—HotpotQA, WebQuestions, and Dragonball\. Together, these datasets cover scientific retrieval, legal retrieval, open\-domain passage retrieval, multi\-hop question answering, short\-form factual QA, and multi\-domain RAG evaluation\.
#### SCI\-DOCSCohanet al\.\([2020](https://arxiv.org/html/2606.18381#bib.bib25)\)
is a scientific document representation benchmark introduced with SPECTER\. It contains multiple document\-level tasks, including citation prediction, document classification, and recommendation\. We use SCI\-DOCS as a scientific retrieval benchmark because scientific abstracts are dense, terminology\-heavy, and often contain multiple related concepts within a short span\. This makes the dataset useful for testing whetherSproutRAGcan construct coherent sentence\-level hierarchies in technical domains\.
#### LegalBench\-RAGPipitone and Alami \([2024](https://arxiv.org/html/2606.18381#bib.bib27)\)
is designed specifically to evaluate retrieval in legal RAG pipelines\. It contains 6,858 query\-answer pairs over legal documents such as NDAs, M&A agreements, commercial contracts, and privacy policies\. Unlike broad document retrieval, LegalBench\-RAG emphasizes precise snippet retrieval: the model must identify minimal legal evidence rather than simply retrieve a generally relevant document\. This makes it a strong test of fine\-grained precision\.
#### DragonballZhuet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib28)\)
is a multi\-domain and multilingual RAG benchmark released as part of RAGEval\. It contains questions across finance, legal, and medical scenarios in English and Chinese\. We use Dragonball for both retrieval and end\-to\-end generation because it combines heterogeneous domains, long evidence contexts, and domain\-specific terminology\. This setting tests whether retrieval methods can recover relevant evidence without introducing excessive distractor context\.
#### MS MARCONguyenet al\.\([2016](https://arxiv.org/html/2606.18381#bib.bib21)\)
is a large\-scale open\-domain passage retrieval benchmark built from real web search queries\. Its corpus consists of millions of short passages, and the task requires identifying passages that answer natural language questions\. Compared with SCI\-DOCS and LegalBench\-RAG, MS MARCO has shorter retrieval units and more direct query\-passage matching, providing a complementary test of retrieval effectiveness when evidence is already compact\.
#### HotpotQAYanget al\.\([2018](https://arxiv.org/html/2606.18381#bib.bib34)\)
is an open\-domain multi\-hop QA benchmark built from Wikipedia\. Its questions require reasoning over multiple supporting documents or facts, and the dataset provides sentence\-level supporting\-fact annotations\. We use HotpotQA for end\-to\-end answer generation because it directly tests whether retrieved evidence supports multi\-step reasoning, which aligns withSproutRAG’s goal of retrieving evidence across multiple granularities\.
#### WebQuestionsBerantet al\.\([2013](https://arxiv.org/html/2606.18381#bib.bib35)\)
is an open\-domain factual QA benchmark built from natural language questions collected from web search logs\. The answers are typically short entities or phrases, making token\-level F1 a suitable evaluation metric\. We include WebQuestions to evaluate whetherSproutRAGalso improves short\-form factual QA, where retrieval must remain precise and avoid adding unnecessary context\.
### A\.2Baseline Details
We compareSproutRAGagainst two groups of baselines: efficient retrieval\-oriented methods used in the retrieval evaluation, and system\-level RAG methods used in the end\-to\-end generation comparison\. Unless otherwise stated, we follow the configurations recommended in the original papers\. For methods requiring an LLM generator or reranker, we use the sameQwen3\-8Bgenerator andQwen3\-Reranker\-4Breranker for fair comparison\.
#### Dense X RetrievalChenet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib8)\)
decomposes documents into fine\-grained propositions and uses these propositions as retrieval units\. This improves precision by making each unit more atomic and self\-contained\. However, proposition\-level retrieval can weaken broader contextual continuity, since related facts are retrieved independently rather than as coherent multi\-sentence evidence\.
#### Meta\-ChunkingZhaoet al\.\([2025b](https://arxiv.org/html/2606.18381#bib.bib7)\)
uses LLM\-based signals to identify semantically meaningful chunk boundaries instead of relying on fixed\-size segmentation\. We evaluate both variants:Meta\-Chunking\-PPL, which uses perplexity changes to detect boundaries, andMeta\-Chunking\-MSP, which uses margin\-sampling\-based boundary decisions\. These methods improve chunk coherence, but once chunks are formed, cross\-chunk semantic dependencies are not explicitly modeled\.
#### MoCZhaoet al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib23)\)
improves over single\-strategy chunking by dynamically routing text to different chunking strategies or granularity choices\. It is a strong adaptive chunking baseline because it can better match the segmentation strategy to the local document structure\. However, MoC still primarily operates at the chunk\-construction stage and does not build a retrieval\-time hierarchy over sentence\-level evidence\.
#### RAPTORSarthiet al\.\([2024](https://arxiv.org/html/2606.18381#bib.bib9)\)
recursively clusters chunks and summarizes each cluster to build a hierarchical tree\. Retrieval can then operate over both lower\-level chunks and higher\-level summaries\. This provides a natural multi\-granularity baseline, but its hierarchy is based on embedding\-space clustering and LLM\-generated summaries, which can introduce information loss and additional indexing cost\.
#### LightRAGGuoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib11)\)
augments retrieval with graph\-structured knowledge and combines local and global retrieval signals\. It is designed to improve retrieval over connected evidence by exploiting entity and relation structure\. We include it as a structured retrieval baseline, especially for settings where graph\-style evidence organization can improve coverage\.
#### PropRAGWang and Han \([2025](https://arxiv.org/html/2606.18381#bib.bib14)\)
represents documents using propositions and performs beam\-style traversal over proposition paths\. It is closely related to our use of beam search, but differs in its underlying structure: PropRAG searches over a proposition graph, whereasSproutRAGsearches over an attention\-guided sentence hierarchy\. PropRAG is therefore a strong baseline for testing whether hierarchical sentence\-level structure provides benefits beyond proposition\-path retrieval\.
#### SAKI\-RAGTaoet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib5)\)
uses a Sentence\-Level Large Language Model \(SLLM\) to estimate inter\-sentence attention and expand retrieved chunks with related sentences\. It is the closest baseline toSproutRAGbecause it also uses sentence\-level attention signals\. However, SAKI\-RAG performs pairwise expansion and relies on LLM filtering during retrieval, whileSproutRAGconverts learned attention into a global tree and retrieves across multiple granularities without inference\-time LLM filtering\.
#### GraphRAGEdgeet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib10)\)
constructs an entity\-relation graph from the corpus and uses graph structure to support retrieval and generation\. It is effective for queries that align well with entity\-centric evidence, but it depends on reliable entity extraction and relation construction\. We include GraphRAG in the end\-to\-end comparison because it is a system\-level RAG method with substantial LLM\-based preprocessing and reasoning\.
#### ReflectiveRAGVermaet al\.\([2026](https://arxiv.org/html/2606.18381#bib.bib33)\)
introduces adaptive retrieval and generation through self\-reflection\. Instead of using a fixed retrieval budget, it evaluates whether retrieved evidence is sufficient and can reformulate or expand retrieval when needed\. This makes it a strong final\-performance baseline, but its main contribution lies in adaptive evidence use and generation rather than efficient retrieval structure\.
#### PageIndexZhanget al\.\([2025a](https://arxiv.org/html/2606.18381#bib.bib29)\)
replaces conventional vector retrieval with a reasoning\-based hierarchical page index\. An LLM navigates the index and selects evidence through multi\-step reasoning, which can improve final answer quality\. However, because it performs LLM\-heavy online search and evidence construction, we include it only in end\-to\-end performance comparisons rather than as an efficient retrieval baseline\.
#### REFRAGLinet al\.\([2025](https://arxiv.org/html/2606.18381#bib.bib30)\)
focuses on generation\-side efficiency for RAG by exploiting sparsity in retrieved contexts\. It compresses retrieved chunks and selectively expands them during decoding, reducing the effective context processed by the generator\. Since its main contribution is decoding\-time optimization rather than retrieval or indexing, we include it in the final\-performance and efficiency comparison\.
## Appendix BRetrieval Performance at Different Cutoffs
Tables[4](https://arxiv.org/html/2606.18381#A2.T4),[5](https://arxiv.org/html/2606.18381#A2.T5), and[6](https://arxiv.org/html/2606.18381#A2.T6)report retrieval performance at cutoffsk=1k\{=\}1,k=3k\{=\}3, andk=5k\{=\}5, respectively, across all six benchmarks\. As expected, both precision and recall increase monotonically withkkfor all methods, since retrieving more documents provides greater coverage of relevant passages\. The relative ordering of methods remains consistent across all cutoffs:SproutRAGoutperforms all non\-oracle baselines at every depth\. This consistency demonstrates that the gains from topic\-guided retrieval are not specific to any particular cutoff, but reflect a robust improvement in retrieval quality across the full range of evaluation settings reported here\.
Table 4:Retrieval performance at depthk=1k\{=\}1across four benchmarks \(IE↑\\uparrow, Precision↑\\uparrow, Recall↑\\uparrow\)\.Bold= best;underline= second\-best\.SproutRAGrows are shaded\.Table 5:Retrieval performance at depthk=3k\{=\}3across four benchmarks \(IE↑\\uparrow, Precision↑\\uparrow, Recall↑\\uparrow\)\.Bold= best;underline= second\-best\.SproutRAGrows are shaded\.Table 6:Retrieval performance at depthk=5k\{=\}5across four benchmarks \(IE↑\\uparrow, Precision↑\\uparrow, Recall↑\\uparrow\)\.Bold= best;underline= second\-best\.SproutRAGrows are shaded\.
## Appendix CQualitative Analysis: Recovering Multi\-Sentence Legal Evidence
Table[C](https://arxiv.org/html/2606.18381#A3)presents a qualitative example that illustrates why retrievinginternal tree nodesis useful\. The query asks whether a software services agreement limits the provider’s liability and what exceptions apply\. A complete answer requires more than a single sentence or a pair of related sentences: the model must recover the excluded damages, the aggregate liability cap, the scope of the limitation across legal theories, and the carve\-outs for indemnification, confidentiality breaches, gross negligence, and willful misconduct\. These pieces form a coherent clause\-level unit, but they are distributed across four sentences\. MoC retrieves a locally coherent chunk containing the liability cap, but this evidence is too narrow to answer the full query\. SAKI\-RAG improves over local chunking by linking the damage\-exclusion sentence with the liability\-cap sentence; however, its pairwise expansion still misses the later exception sentence, which is essential for a legally complete answer\. In contrast,SproutRAGretrieves the internal nodev1:4v\_\{1:4\}, which groups all four relevant sentences into a single clause\-level unit\. This allows the generator to answer both parts of the query: the agreement limits liability through damage exclusions and a monetary cap, but the limitation does not apply to the specified carve\-outs\.
Method / UnitRetrieved EvidenceAnalysis\\rowcolorgray\!12Query:Does the agreement limit the provider’s liability, and what exceptions or exclusions apply?MoCRetrieved chunk: liability cap\.Provider’s aggregate liability under this Agreement shall not exceed the fees paid by Client during the twelve \(12\) months preceding the event giving rise to the claim\.MoC identifies a locally coherent chunk around the monetary cap, but the retrieved unit is too narrow for the query\. It answershow muchliability is capped, but misses the excluded damages, the scope across legal theories, and the exceptions\.SAKI\-RAGRetrieved pairwise expansion: damage exclusion \+ liability cap\.1\.In no event shall Provider be liable for any indirect, incidental, special, consequential, exemplary, or punitive damages\.2\.Provider’s aggregate liability shall not exceed the fees paid during the prior twelve months\.SAKI\-RAG improves over a single chunk by linking two related sentences\. However, the evidence remains pairwise, so it captures the main limitation but misses the later sentence listing exceptions such as indemnification, confidentiality breach, gross negligence, and willful misconduct\.Leafs1s\_\{1\}In no event shall Provider be liable for any indirect, incidental, special, consequential, exemplary, or punitive damages arising out of or relating to this Agreement\.Identifies excluded damages, but does not provide the monetary cap or exceptions\.Leafs2s\_\{2\}Provider’s aggregate liability under this Agreement shall not exceed the fees paid by Client during the twelve \(12\) months preceding the event giving rise to the claim\.Provides the liability cap, but not the scope or carve\-outs\.Leafs3s\_\{3\}The foregoing limitation shall apply regardless of the form of action, whether in contract, tort, strict liability, or otherwise\.Clarifies that the limitation applies across legal theories\.Leafs4s\_\{4\}The limitations in this Section shall not apply to Provider’s indemnification obligations, breach of confidentiality, gross negligence, or willful misconduct\.Provides the exceptions required for a complete legal answer\.\\rowcolorblue\!8SproutRAGinternal nodev1:4v\_\{1:4\}Retrieved clause\-level node containings1s\_\{1\}–s4s\_\{4\}:1\.excluded damages,2\.aggregate liability cap,3\.scope across legal theories,4\.exceptions and carve\-outs\.SproutRAGretrieves the middle node containing more than two sentences\.This gives the generator the full limitation\-of\-liability clause, enabling a complete answer that includes both the limitation and the exceptions\.
Table 7:Qualitative comparison on a limitation\-of\-liability query\. MoC retrieves a locally coherent but incomplete chunk, and SAKI\-RAG retrieves a related sentence pair that captures the main limitation but misses the exception sentence\.SproutRAGretrieves an internal clause\-level node containing four sentences, allowing the answer to include excluded damages, the liability cap, legal\-theory scope, and carve\-outs\.Similar Articles
Hybrid retrieval + dependency-graph expansion beats embeddings-only for code RAG — measured, CI-gated
Archex is a new open-source code RAG tool that improves retrieval by combining hybrid search (BM25F + dense embeddings), cross-encoder reranking, and dependency-graph expansion, achieving much higher recall and token efficiency than pure embeddings-based approaches.
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Q-RAG introduces a reinforcement learning-based fine-tuning approach for embedder models to enable efficient multi-step retrieval, achieving state-of-the-art results on long-context benchmarks up to 10M tokens. This method provides a resource-efficient alternative to fine-tuning small LLMs for complex multi-step search tasks.
LightRAG: Simple and Fast Retrieval-Augmented Generation
The article introduces LightRAG, an open-source framework that enhances Retrieval-Augmented Generation by integrating graph structures for improved contextual awareness and efficient information retrieval.
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation
ContextRAG introduces an extraction-free method for constructing hierarchical graph indices for retrieval-augmented generation, using Residual-Quantization K-Means and Formal Concept Analysis to reduce LLM calls and tokens by orders of magnitude while maintaining competitive F1 scores on multi-hop questions.
Rethinking RAG in Long Videos: What to Retrieve and How to Use It?
This paper introduces V-RAGBench, a benchmark for evaluating retrieval-augmented generation over long egocentric videos, and CARVE, a method that adaptively selects retrieval configurations per chunk to improve VideoRAG performance.