Structures Facilitate Retrieve, Rerank, and Generate
Summary
The paper proposes SF-Re2G, a method that improves document-grounded dialogue systems by leveraging document structure to enhance retrieval, reranking, and generation. It validates on Chinese and English datasets.
View Cached Full Text
Cached at: 06/03/26, 09:38 AM
# Structures Facilitate Retrieve, Rerank, and Generate Source: [https://arxiv.org/abs/2606.03247](https://arxiv.org/abs/2606.03247) [View PDF](https://arxiv.org/pdf/2606.03247) > Abstract:Document\-grounded dialogue systems \(DGDS\) utilize knowledge from external documents to answer domain\-specific user questions\. Existing solutions typically divide documents into independent passages for retrieval and response generation\. This approach, however, neither makes good use of structural information within documents nor provides enough \(document\) context for knowledge selection and responses\. This paper proposes SF\-Re2G to address such issues systematically\. Firstly, we seek to improve a passage representation by contrasting it with others of the same section, thus improving the retrieval performance\. Secondly, a structure\-enhanced reranker is built, leveraging the fact that multiple grounding passages of one dialog turn tend to be in the same neighborhood\. Specifically, candidates from the retrieval are grouped into subgraphs according to the document structure\. The reranker will rescore the candidate integrating its group information\. Finally, the chosen passages are used for responses, taking into account the subgraph context for better generation\. Experimental results on two DGDS datasets validate our method for both Chinese and English\. ## Submission history From: Xujie Zhang \[[view email](https://arxiv.org/show-email/1a0e14e6/2606.03247)\] **\[v1\]**Tue, 2 Jun 2026 07:09:41 UTC \(8,382 KB\)
Similar Articles
Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents
Proposes Structure-Aware RAG (SA-RAG), which uses tables as an intermediate structured representation to reduce noise in retrieval-augmented generation for conversational agents, with quality-aware metadata generation and two table generation methods, outperforming existing baselines on noisy real-world datasets.
LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding
LFRAG proposes a layout-oriented fine-grained retrieval-augmented generation framework that moves from page-level to block-level retrieval in multimodal documents, achieving state-of-the-art performance and 73% token reduction on the new LFDocQA benchmark.
MM-BizRAG: Rethinking Multimodal Retrieval-Augmented Generation for General Purpose Enterprise Q&A
MM-BizRAG is a multimodal retrieval-augmented generation system for enterprise Q&A that uses document structure-aware splitting and layout-aware parsing to outperform vision-centric baselines by up to 32% on heterogeneous enterprise documents. The paper also introduces FastRAGEval, a cost-efficient LLM-based evaluation metric with stronger human alignment than RAGChecker.
Fine-grained Fragment Retrieval in Multi-modal Long-form Dialogues
This paper introduces Fine-grained Fragment Retrieval (FFR), a new task for locating semantically coherent multi-modal fragments (text and images) within long-form dialogues. The authors propose F2RVLM, a generation-based retrieval model trained with reinforcement learning, and FFRS, a two-stage retrieval system, along with a new dataset MLDR for evaluation.
Disco-RAG: Discourse-Aware Retrieval-Augmented Generation
Disco-RAG proposes a discourse-aware retrieval-augmented generation framework that integrates discourse signals through intra-chunk discourse trees and inter-chunk rhetorical graphs to improve knowledge synthesis in LLMs. The method achieves state-of-the-art results on QA and summarization benchmarks without fine-tuning.