Graph-Augmented Retrieval for Cross-Entity Financial Sentiment Analysis: A Comparative Study

arXiv cs.CL Papers

Summary

This paper presents a comparative study of Graph-RAG versus standard vector-only RAG for cross-entity financial sentiment analysis, finding statistically significant improvements in entity recall and answer relevancy at modest latency cost.

arXiv:2606.00062v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) has become foundational for grounding large language models in domain-specific corpora, yet conventional vector-based RAG systems are fundamentally limited in their ability to capture the structured, multi-entity relationships that underpin financial market analysis. This paper presents a comprehensive comparative study of a novel two-hop Graph-RAG architecture versus a standard vector-only baseline for cross-entity financial sentiment analysis. Our system constructs a sentiment-weighted knowledge graph of 59 equity entities from 255 news articles covering 10 major technology stocks, then augments dense retrieval with intensity-filtered graph traversal over INFLUENCES edges to surface relational evidence inaccessible to vector search alone. We evaluate both architectures on 100 grounded queries (30 Direct, 70 Relational) using semantic similarity, entity recall, RAGAS metrics, latency benchmarks, and ablation studies. Graph-RAG achieves a statistically significant improvement in entity recall (+6.4%, p < 0.001, Wilcoxon signed-rank) and delivers substantially more relevant answers for complex multi-entity queries (+11.7% Answer Relevancy), with gains concentrating in relational question types (+16.1%). Critically, these improvements come at no measurable cost to answer quality (delta = +0.001 semantic similarity, Cohen's d = 0.078), with a modest 22.6% increase in mean latency offset by an 80% reduction in latency variance. An ablation study on the graph traversal intensity threshold reveals an inverted-U relationship with answer quality, identifying tau = 0.5 as optimal over the production default of tau = 0.7. These findings characterize a precision-for-coverage trade-off inherent to graph-augmented retrieval and provide actionable architectural guidance for practitioners building RAG systems for multi-entity financial analysis.
Original Article
View Cached Full Text

Cached at: 06/02/26, 03:35 PM

# Graph-Augmented Retrieval for Cross-Entity Financial Sentiment Analysis: A Comparative Study
Source: [https://arxiv.org/html/2606.00062](https://arxiv.org/html/2606.00062)
###### Abstract

Retrieval\-Augmented Generation \(RAG\) has become foundational for grounding large language models in domain\-specific corpora, yet conventional vector\-based RAG systems are fundamentally limited in their ability to capture the structured, multi\-entity relationships that underpin financial market analysis\. This paper presents a comprehensive comparative study of a novel two\-hop Graph\-RAG architecture versus a standard vector\-only baseline for cross\-entity financial sentiment analysis\. Our system constructs a sentiment\-weighted knowledge graph of 59 equity entities from 255 news articles covering 10 major technology stocks, then augments dense retrieval with intensity\-filtered graph traversal overINFLUENCESedges to surface relational evidence inaccessible to vector search alone\. We evaluate both architectures on 100 grounded queries \(30 Direct, 70 Relational\) using semantic similarity, entity recall, RAGAS metrics, latency benchmarks, and ablation studies\. Graph\-RAG achieves a statistically significant improvement in entity recall \(\+6\.4%\+6\.4\\%,p<0\.001p<0\.001, Wilcoxon signed\-rank\) and delivers substantially more relevant answers for complex multi\-entity queries \(\+11\.7%\+11\.7\\%Answer Relevancy\), with gains concentrating in relational question types \(\+16\.1%\+16\.1\\%\)\. Critically, these improvements come at no measurable cost to answer quality \(Δ=\+0\.001\\Delta=\+0\.001semantic similarity, Cohen’sd=0\.078d=0\.078\), with a modest22\.6%22\.6\\%increase in mean latency offset by an80%80\\%reduction in latency variance\. An ablation study on the graph traversal intensity threshold reveals an inverted\-U relationship with answer quality, identifyingτ=0\.5\\tau=0\.5as optimal over the production default ofτ=0\.7\\tau=0\.7\. These findings characterize a precision\-for\-coverage trade\-off inherent to graph\-augmented retrieval and provide actionable architectural guidance for practitioners building RAG systems for multi\-entity financial analysis\.

## IIntroduction

The rapid digitization of global financial markets has resulted in an unprecedented volume of unstructured textual data, making real\-time equity synthesis a critical challenge for financial analysts\. To navigate this complexity, Large Language Models \(LLMs\) have been increasingly deployed within Retrieval\-Augmented Generation \(RAG\) frameworks to ground generative outputs in authoritative external corpora\[[1](https://arxiv.org/html/2606.00062#bib.bib1)\]\. However, as the domain shifts toward specialized financial market analysis, traditional “vector\-only” RAG architectures are encountering a significant “contextual plateau\.” While effective for localized fact\-retrieval, these systems remain fundamentally blind to the structured interdependencies—such as supply\-chain cascades and multi\-hop influence networks—that define systemic market risk\[[2](https://arxiv.org/html/2606.00062#bib.bib2)\]\.

Standard vector\-based retrieval operates on the assumption that semantic proximity in a latent vector space is a sufficient proxy for informational relevance\. In the financial sector, this paradigm faces three critical technical limitations\. First, traditional RAG treats text as isolated chunks, neglecting the explicit structured relationships required to navigate multi\-hop logic\[[2](https://arxiv.org/html/2606.00062#bib.bib2)\]\. This results in an inability to perform “global sensemaking” or query\-focused summarization across a corpus to identify high\-level market trends\[[3](https://arxiv.org/html/2606.00062#bib.bib3)\]\. Second, the concatenation of multiple semantically similar text snippets often leads to the “lost in the middle” phenomenon, where crucial financial details are obscured within redundant, high\-token\-count contexts\[[2](https://arxiv.org/html/2606.00062#bib.bib2),[4](https://arxiv.org/html/2606.00062#bib.bib4)\]\. Finally, financial terminology is highly susceptible to “semantic noise”; without a structured graph to logically separate entities, semantic overlap in dense retrieval can lead to critical factual errors in analysis\[[5](https://arxiv.org/html/2606.00062#bib.bib5)\]\.

Furthermore, recent research into intertextual connections has highlighted the importance of sentiment “contagion”—the propagation of sentiment across documents and entities\[[6](https://arxiv.org/html/2606.00062#bib.bib6)\]\. While purely lexical analysis often overlooks the discursive signals of how sentiment spreads from one market participant to another, a Graph\-Augmented approach allows for the modeling of these “ripples\.” Unlike “Hybrid RAG,” which often switches between keyword and vector search, a true “Graph\-RAG” architecture traverses a pre\-constructed knowledge graph, using retrieved chunks as “seeds” to explore adjacent entities and relationship triples\[[2](https://arxiv.org/html/2606.00062#bib.bib2),[3](https://arxiv.org/html/2606.00062#bib.bib3)\]\. This not only improves context precision but also significantly condenses context density—often requiring a 9x to 43x reduction in token consumption compared to traditional map\-reduce approaches\[[2](https://arxiv.org/html/2606.00062#bib.bib2)\]\.

To address these limitations, this paper proposes and evaluates a Graph\-Augmented RAG framework utilizing a high\-intensity network of 59 distinct equity entities\. By mapping news vectors onto a multi\-layered Neo4j knowledge graph, our architecture enables systemic traversals that reconcile narrative\-driven news headlines with quantitative influence and sentiment intensity scores\. We evaluate this framework against a baseline General RAG system using the RAGAS \(Retrieval\-Augmented Generation Assessment\) framework, focusing on Faithfulness and Context Relevancy\[[4](https://arxiv.org/html/2606.00062#bib.bib4)\]\. Our research demonstrates that by integrating a specialized 59\-entity influence network, the system can effectively identify “market dissonance”—instances where the textual narrative of news is corrected by the underlying metadata of the equity graph—providing a more robust risk profile for automated financial market analysis\.

## IIRelated Work

The evolution of Retrieval\-Augmented Generation \(RAG\) has been primarily driven by the need to ground Large Language Models \(LLMs\) in verifiable facts\[[1](https://arxiv.org/html/2606.00062#bib.bib1)\]\. However, as implementations move into specialized domains like finance, the limitations of traditional vector\-based retrieval have become a critical focus of academic inquiry\.

### II\-AThe Semantic Search Bottleneck in Traditional RAG

Recent research identifies a fundamental “bottleneck” in standard vector RAG when tasked with complex, cross\-document reasoning\. Edge et al\.\[[3](https://arxiv.org/html/2606.00062#bib.bib3)\]highlight that conventional dense semantic retrieval is optimized for finding localized, explicitly stated facts but fails on “sensemaking” queries that require a global understanding of an entire dataset\. Barry et al\.\[[2](https://arxiv.org/html/2606.00062#bib.bib2)\]argue that because vector RAG treats text as isolated chunks, it inherently neglects the explicit structured relationships—such as supply chains or regulatory dependencies—critical for multi\-hop logic\. Furthermore, when tasked with cross\-document comparisons, vector\-only systems suffer from severe computational complexity, often requiring expensive pairwise comparisons that result in redundant, excessively lengthy context windows\. This leads to the “lost in the middle” phenomenon, where LLMs lose focus on crucial details buried within long, concatenated passages\[[2](https://arxiv.org/html/2606.00062#bib.bib2),[4](https://arxiv.org/html/2606.00062#bib.bib4)\]\.

### II\-BStructural Knowledge in Financial Informatics

The justification for utilizing structured Knowledge Graphs \(KG\) over raw text is particularly strong in financial report generation\. Chen et al\.\[[5](https://arxiv.org/html/2606.00062#bib.bib5)\]note that expert\-level financial data is often sparsely distributed across free\-form text, making it highly susceptible to “semantic noise\.” Structured knowledge allows for the filtering of this noise by explicitly mapping sequential, causal, and hypernym\-hyponym relations between entities\[[5](https://arxiv.org/html/2606.00062#bib.bib5)\]\. Zehra et al\.\[[7](https://arxiv.org/html/2606.00062#bib.bib7)\]further emphasize that annual reports lack standardization in format and vocabulary, which hinders automated extraction from raw text\. In contrast, a Financial Knowledge Graph \(FKG\) provides a logical map that reduces redundancy and token overhead\. Additionally, Nasiopoulos et al\.\[[6](https://arxiv.org/html/2606.00062#bib.bib6)\]introduce the concept of sentiment “contagion,” demonstrating that analyzing intertextual connections considerably improves predictive accuracy compared to purely lexical analysis\. This underscores the necessity of a networked approach to capture how sentiment cascades across an equity ecosystem\.

### II\-CArchitectural Paradigms: Implicit vs\. Explicit Retrieval

The literature distinguishes between “Implicit” \(sub\-symbolic\) and “Explicit” \(symbolic\) retrieval within modern Hybrid RAG architectures\. Implicit retrieval utilizes conventional vector embeddings to findkk\-nearest neighbors, while Explicit retrieval translates natural language into structured queries, such as text\-to\-Cypher, to navigate a KG\[[2](https://arxiv.org/html/2606.00062#bib.bib2)\]\. To bridge these methods, researchers have proposed the use of “seed” nodes\. In this logic, traditional semantic search identifies a subset of relevant text chunks which then act as anchors to explore the graph, retrieving adjacent entities and triples that a vector search alone would miss\[[2](https://arxiv.org/html/2606.00062#bib.bib2)\]\. For global summarization, Edge et al\.\[[3](https://arxiv.org/html/2606.00062#bib.bib3)\]propose partitioning the KG into a hierarchy of “communities\.” By pre\-generating summaries for these modular groups, the system can perform a map\-reduce process to aggregate a holistic response from an entire corpus, significantly reducing token consumption\.

### II\-DReference\-Free Evaluation Frameworks

A significant challenge in RAG development is the lack of human\-annotated ground\-truth datasets for real\-time evaluation\. Es et al\.\[[4](https://arxiv.org/html/2606.00062#bib.bib4)\]argue that traditional NLP metrics like BLEU or ROUGE are insufficient, as they rely on reference answers and often fail to predict downstream performance in long\-form generation\. To address this, the RAGAS framework advocates for LLM\-based, “reference\-free” metrics\. By utilizing an LLM to judge specific dimensions—Faithfulness \(groundedness\), Context Relevance \(signal\-to\-noise ratio\), and Answer Relevance—developers can estimate system correctness without the bottleneck of human annotation, allowing for faster iterative cycles in RAG architecture design\[[2](https://arxiv.org/html/2606.00062#bib.bib2),[4](https://arxiv.org/html/2606.00062#bib.bib4)\]\.

## IIIMethodology

The research utilizes a dual\-architecture approach to quantify the impact of graph\-based relational context on financial synthesis\. The methodology transitions from a high\-concurrency data ingestion phase to a comparative retrieval study between a vector\-only baseline and the proposed 2\-hop graph framework\. The overall system architecture is illustrated in Fig\.[1](https://arxiv.org/html/2606.00062#S3.F1)\.

![Refer to caption](https://arxiv.org/html/2606.00062v1/methodology_pictures/figure1.png)

Figure 1:Overall system architecture for Graph\-RAG financial synthesis\. The pipeline spans from automated corpus construction through knowledge graph ingestion to dual\-path retrieval and LLM generation\.### III\-AAutomated Corpus Construction and Pre\-processing

The news corpus was generated using a high\-concurrency extraction engine targeting ten high\-liquidity primary tickers \(Tp​r​i​m​a​r​y=\{T\_\{primary\}=\\\{AAPL, GOOGL, MSFT, AMZN, TSLA, NVDA, META, NFLX, PLTR, DIS\}\\\}\)\. Using Playwright and BeautifulSoup, the engine performed iterative browser scrolling to capture real\-time disclosures, resulting in 300\+ articles with associated metadata \(URLs and publication dates\)\.

The raw text was fragmented into 1,477 distinct news chunks\. Each chunk was encoded into a 1,024\-dimensional latent space using the BAAI/bge\-large\-en\-v1\.5 transformer model\. This model was selected for its state\-of\-the\-art performance in retrieval tasks on the Massive Text Embedding Benchmark \(MTEB\)\.

### III\-BRelational Impact Modeling and Graph Topology

To build the knowledge graph, unstructured text was transformed into structured triples through a dual\-model pipeline:

1. 1\.Entity Extraction \(GLiNER\):We utilized the knowledgator/gliner\-bi\-large\-v2\.0 model for zero\-shot recognition of custom labels: “Stock Ticker,” “Equity Index,” “Investment Bank,” and “Organization\.”
2. 2\.Sentiment Weighting \(FinBERT\):Sentiment polarity was calculated via ProsusAI/finbert\. The relational Intensity \(II\) between entities was defined as the absolute difference between positive and negative scores: I=\|S​c​o​r​ep​o​s−S​c​o​r​en​e​g\|I=\|Score\_\{pos\}\-Score\_\{neg\}\|\(1\)

The normalized entities resulted in a Neo4j graph topology containing 59 canonical equity entities\. The schema mapsTickernodes toChunknodes viaHAS\_NEWS\_CHUNKedges, while inter\-ticker relationships are defined by directedINFLUENCESedges weighted by sentiment and intensity\. The knowledge graph schema is shown in Fig\.[2](https://arxiv.org/html/2606.00062#S3.F2)\.

![Refer to caption](https://arxiv.org/html/2606.00062v1/methodology_pictures/knowledge_graph_schema_for_equity_entites_influence_netwok.png)

Figure 2:Knowledge graph schema for equity entities influence network\. Ticker nodes are connected via sentiment\-weighted INFLUENCES edges, with news chunks attached as evidence\.
### III\-CBaseline Architecture: General RAG \(Vector\-Only\)

The General RAG system serves as the control baseline, representing a standard industry implementation of dense semantic retrieval\.

- •Indexing:All 1,477 chunk embeddings were indexed using FAISS \(Facebook AI Similarity Search\) with anIndexFlatIP\(Inner Product\) configuration for rapid exactkk\-nearest neighbor search\.
- •Retrieval:For a given queryqq, the system retrieves the topk=5k=5chunks based on maximum cosine similarity in the latent vector space\.
- •Contextualization:The retrieved segments are concatenated into a flat context window: “— SOURCE ARTICLE — TEXT: \{chunk\_text\}”\. No relational metadata or secondary entity information is included in this baseline\.

### III\-DProposed Architecture: 2\-Hop Graph\-RAG

The Graph\-RAG framework leverages the Neo4j structure to perform systemic traversals that capture “market ripples\.” The comparative logic between both architectures is illustrated in Fig\.[3](https://arxiv.org/html/2606.00062#S3.F3)\.

- •Hop 1 \(Semantic Anchor\):The system performs a vector search within the Neo4jnews\_vectorsindex to identify the topk=5k=5“seed” chunks\.
- •Hop 2 \(Structural Expansion\):From the ticker associated with each seed chunk, the system traversesINFLUENCESedges whereIntensity\>0\.7\\text\{Intensity\}\>0\.7to find neighboring entities\.
- •Neighbor Re\-ranking:For each neighbor ticker, the system retrieves its associated news chunks and re\-ranks them against the query using the BGE model\. Only the top 2 most relevant chunks per neighbor with a similarity score\>0\.2\>0\.2are added as “Network Evidence\.”
- •Synthesis:The LLM receives a graph\-augmented context containing primary news, connected entity names, their numerical sentiment scores, and the re\-ranked evidence chunks\.

![Refer to caption](https://arxiv.org/html/2606.00062v1/methodology_pictures/figure2.png)

Figure 3:General RAG vs\. Graph RAG logic diagram\. The baseline retrieves only semantically similar chunks, while Graph\-RAG performs structural expansion via intensity\-filtered graph traversal\.
### III\-EGeneration and Evaluation Framework

Both systems utilize GPT\-4o\-mini \(Azure OpenAI\) for generation, employing identical parameters: a temperature of0\.20\.2and a unified system prompt enforcing strict context grounding\.

Performance was evaluated through a multi\-tiered approach:

- •Statistical Significance:Semantic similarity \(cosine similarity between answer and ground truth\) and Entity Recall were calculated for the full dataset of 100 queries\. Statistical significance was validated through Wilcoxon signed\-rank tests performed on this comprehensive set\.
- •RAGAS Metrics:Due to high API consumption, the RAGAS framework\[[4](https://arxiv.org/html/2606.00062#bib.bib4)\]\(Faithfulness, Answer Relevancy, Context Precision, and Context Recall\) was performed on a stratified sample of 25 queries, ensuring proportional representation of “Direct” and “Relational” question types\.

## IVResults

This section presents the empirical findings from our comparative evaluation of the General RAG baseline and the proposed 2\-hop Graph\-RAG system across 100 evaluation queries\.

### IV\-AOverall Performance Comparison

Table[I](https://arxiv.org/html/2606.00062#S4.T1)summarizes the aggregate performance of both retrieval architectures across all evaluation dimensions\.

TABLE I:Aggregate Evaluation Results \(n=100n=100queries\)The primary finding is that Graph\-RAG achieves a statistically significant improvement in entity recall \(\+6\.4%\+6\.4\\%,p=0\.000043p=0\.000043, Wilcoxon signed\-rank\), confirming that structural graph traversal surfaces entities that pure vector similarity misses\. Semantic similarity between the two systems is statistically indistinguishable \(Δ=\+0\.001\\Delta=\+0\.001,p=0\.281p=0\.281, Cohen’sd=0\.078d=0\.078\), confirming that the additional graph context neither improves nor degrades generation quality\.

Among the RAGAS metrics\[[4](https://arxiv.org/html/2606.00062#bib.bib4)\], the most notable result is the\+11\.7%\+11\.7\\%improvement in Answer Relevancy, suggesting that graph\-augmented context helps the LLM produce more topically focused responses\. Context Precision also favors Graph\-RAG \(\+6\.1%\+6\.1\\%\), indicating that the re\-ranking step effectively filters irrelevant neighbor evidence\.

### IV\-BStratified Analysis by Question Type

To isolate the effect of query complexity, we stratified the RAGAS evaluation into Direct \(single\-entity factual\) and Relational \(cross\-entity impact\) question types\. Results are presented in Table[II](https://arxiv.org/html/2606.00062#S4.T2)\.

TABLE II:RAGAS Metrics Stratified by Question TypeGraph\-RAG’s advantage concentrates heavily in Relational queries, where Answer Relevancy improves by\+16\.1%\+16\.1\\%, as shown in Fig\.[4](https://arxiv.org/html/2606.00062#S4.F4)\. This is the strongest per\-metric result in the study and demonstrates that the 2\-hop expansion provides substantive value for multi\-entity reasoning tasks\. For Direct queries, both systems perform comparably, with Graph\-RAG marginally improving Context Precision \(\+8\.6%\+8\.6\\%\) at the cost of lower Context Recall \(−11\.9%\-11\.9\\%\)—an expected trade\-off when additional entities dilute single\-article evidence\.

![Refer to caption](https://arxiv.org/html/2606.00062v1/result_figures/ragas_by_question_type.png)

Figure 4:RAGAS scores stratified by question type\. Graph\-RAG’s advantage is pronounced for Relational queries, particularly in Answer Relevancy \(\+16\.1%\+16\.1\\%\)\.
### IV\-CAblation Study: Intensity Thresholdτ\\tau

The intensity thresholdτ\\taugoverns whichINFLUENCESedges are traversed during the second hop\. We evaluated three threshold values on a 20\-question stratified subsample to characterize the sensitivity of the system\. Results are shown in Table[III](https://arxiv.org/html/2606.00062#S4.T3)\.

TABLE III:Ablation: Intensity Threshold vs\. Semantic SimilarityThreshold \(τ\\tau\)Mean Similarity0\.30\.83880\.50\.85500\.70\.8501Evaluated on 20\-question subsample\.The results exhibit an inverted\-U pattern, as illustrated in Fig\.[5](https://arxiv.org/html/2606.00062#S4.F5)\. Atτ=0\.3\\tau=0\.3, the system traverses too many weak edges, introducing topically irrelevant evidence and degrading answer quality\. Atτ=0\.7\\tau=0\.7, the system is overly restrictive, missing moderately informative relationships\. The optimal value ofτ=0\.5\\tau=0\.5balances breadth and precision, outperforming the production default ofτ=0\.7\\tau=0\.7by\+0\.5%\+0\.5\\%in semantic similarity\.

![Refer to caption](https://arxiv.org/html/2606.00062v1/result_figures/ablation_threshold_curve.png)

Figure 5:Inverted\-U relationship between intensity thresholdτ\\tauand semantic similarity\. Optimal performance atτ=0\.5\\tau=0\.5; the General RAG baseline is shown as a dashed line\.
### IV\-DLatency and Operational Efficiency

Graph\-RAG incurs a mean latency of9\.039\.03s per query compared to7\.377\.37s for General RAG—a22\.6%22\.6\\%overhead attributable to the Neo4j AuraDB network round\-trip and the neighbor chunk re\-ranking computation\. However, Graph\-RAG exhibits substantially lower latency variance \(σ=0\.96\\sigma=0\.96s vs\.σ=4\.82\\sigma=4\.82s\), yielding more deterministic response times\. The high variance in the General RAG pipeline is driven by fluctuations in Azure OpenAI API response latency, which dominates the end\-to\-end time when retrieval is instantaneous \(FAISS in\-memory\)\.

In terms of context window utilization, Graph\-RAG consumes79%79\\%more tokens on average \(1,7111\{,\}711vs\.955955tokens\), resulting in a lower signal density \(1\.6891\.689vs\.2\.6962\.696relevant entities per1,0001\{,\}000tokens\)\. This indicates that while Graph\-RAG retrieves more total information, a portion of graph\-traversed evidence is tangentially related rather than directly answering the query\.

### IV\-EFailure Case Analysis

Of the 100 evaluation queries, Graph\-RAG scored higher on 52, tied \(within±0\.01\\pm 0\.01similarity\) on 25, and was outperformed by General RAG on 23\. Qualitative examination of the failure cases reveals two dominant patterns:

1. 1\.Single\-entity factual queries:Questions requiring precise information from one article, where additional cross\-entity context dilutes the signal\. Example: “What was AAPL’s Q3 revenue guidance?”
2. 2\.Low\-connectivity entities:Tickers with sparse or weakINFLUENCESedges in the graph, where the second hop produces minimal useful expansion\.

Conversely, Graph\-RAG achieves its largest gains on queries involving supply chain dependencies, competitive dynamics, and sector\-wide sentiment shifts—precisely the scenarios where relational context provides additive information unavailable through semantic similarity alone\.

## VDiscussion

### V\-AThe Precision\-Coverage Trade\-off

Our results demonstrate that Graph\-RAG does not uniformly outperform vector\-only retrieval\. Instead, it shifts the operating point on a precision\-coverage trade\-off curve\. For queries requiring cross\-entity reasoning—which constitute the majority of real\-world financial analyst questions—the 6\.4% improvement in entity recall provides meaningful value\. For simple factual queries, the additional context introduces marginal noise without proportional benefit\.

This finding aligns with the broader observation that RAG system design should be query\-type\-aware\. A production system could route queries to the appropriate retrieval pipeline based on detected query complexity\.

### V\-BAblation as Architectural Guidance

The inverted\-U relationship between the intensity threshold and answer quality is a practical contribution for practitioners\. It suggests that:

1. 1\.Graph traversal filters should not be set at extreme values\.
2. 2\.The optimal threshold depends on graph density and edge quality—our finding ofτ=0\.5\\tau=0\.5is specific to our graph construction pipeline\.
3. 3\.Threshold tuning should be part of the standard Graph\-RAG development workflow\.

### V\-CLimitations

LLM\-generated ground truth\.Our evaluation set uses GPT\-4o\-mini\-generated ideal answers, introducing a ceiling effect where both systems are evaluated against LLM output\. While this limits absolute interpretability, relative comparisons between systems remain valid\. Future work should incorporate human expert annotations\.

Scale\.Our evaluation uses 100 questions over 10 stocks\. While sufficient for primary hypothesis testing \(entity recallp<0\.001p<0\.001\), subgroup analyses \(e\.g\., per\-stock breakdowns\) lack statistical power\. The ablation study on 20 questions should be interpreted as directional\.

Cloud latency\.Graph\-RAG latency is inflated by Neo4j AuraDB network round\-trips\. An on\-premises deployment would reduce this overhead, narrowing the latency gap\.

Single model\.All experiments use GPT\-4o\-mini\. Results may vary with different LLMs, particularly those with larger context windows or stronger instruction\-following capabilities\.

Domain specificity\.Our findings apply to technology sector financial news\. Generalization to other domains \(biomedical, legal\) requires additional validation, though the architecture itself is domain\-agnostic\.

## VIConclusion

We presented a two\-hop Graph\-RAG architecture for financial sentiment analysis and provided a rigorous comparative evaluation against vector\-only retrieval\. Our key findings are:

1. 1\.Graph\-RAG achieves statistically significant improvement in entity recall \(\+6\.4%,p<0\.001p<0\.001\), demonstrating its ability to surface structurally connected entities that vector retrieval misses\.
2. 2\.Graph\-RAG improves Answer Relevancy by\+11\.7%\+11\.7\\%overall, with the gain concentrating in relational queries \(\+16\.1%\+16\.1\\%\), indicating that graph\-augmented context enables more topically focused multi\-entity synthesis\.
3. 3\.This improvement comes at no measurable cost to answer quality \(Δ=\+0\.001\\Delta=\+0\.001semantic similarity,p=0\.281p=0\.281, Cohen’sd=0\.078d=0\.078\) and a 22\.6% increase in latency\.
4. 4\.The intensity threshold exhibits an inverted\-U relationship with answer quality, withτ=0\.5\\tau=0\.5outperforming the defaultτ=0\.7\\tau=0\.7by small margins\.
5. 5\.Graph\-RAG’s advantage concentrates in relational, multi\-entity queries, while vector\-only retrieval is sufficient for simple factual lookups\.

These results characterize a precision\-for\-coverage trade\-off and provide actionable guidance for building retrieval systems in the financial domain\. Future work will explore human evaluation, larger entity graphs, dynamic threshold selection, and multi\-model generalization\.

## References

- \[1\]P\. Lewiset al\., “Retrieval\-augmented generation for knowledge\-intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol\. 33, 2020\.
- \[2\]M\. Barryet al\., “GraphRAG: Leveraging graph\-based efficiency to minimize hallucinations in LLM\-driven RAG for finance data,” Pre\-print, 2025\.
- \[3\]D\. Edgeet al\., “From local to global: A GraphRAG approach to query\-focused summarization,” Microsoft Research,arXiv preprint arXiv:2404\.16130, 2024\.
- \[4\]S\. Es, J\. James, L\. Espinosa\-Anke, and S\. Schockaert, “RAGAS: Automated evaluation of retrieval augmented generation,”arXiv preprint arXiv:2309\.15217, 2023\.
- \[5\]Y\. Chenet al\., “Knowledge\-augmented financial market analysis and report generation,” Tongji University / Ant Group, 2024\.
- \[6\]D\. K\. Nasiopoulos, K\. I\. Roumeliotis, D\. P\. Sakas, K\. Toudas, and P\. Reklitis, “Financial sentiment analysis and classification: A comparative study of fine\-tuned deep learning models,”Int\. J\. Financial Stud\., vol\. 13, no\. 2, p\. 75, 2025\.
- \[7\]S\. Zehraet al\., “Financial knowledge graph based financial report query system,”IEEE Access, 2021\.

Similar Articles

AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases

arXiv cs.AI

This paper introduces AgenticRAG, a framework from Microsoft that enhances enterprise knowledge base retrieval by equipping LLMs with tools for iterative search, document navigation, and analysis. It demonstrates significant improvements in recall and factuality over standard RAG pipelines on multiple benchmarks.

RAG-Anything: All-in-One RAG Framework

Papers with Code Trending

RAG-Anything is a new open-source framework that enhances multimodal knowledge retrieval by integrating cross-modal relationships and semantic matching, outperforming existing methods on complex benchmarks.

LightRAG: Simple and Fast Retrieval-Augmented Generation

Papers with Code Trending

The article introduces LightRAG, an open-source framework that enhances Retrieval-Augmented Generation by integrating graph structures for improved contextual awareness and efficient information retrieval.