I built an open-source Knowledge Graph pipeline with hybrid retrieval to improve LLM multi-hop reasoning [P]

Reddit r/MachineLearning 06/14/26, 10:38 PM Tools

open-source knowledge-graph hybrid-retrieval llm multi-hop-reasoning rag graphrag

Summary

An open-source full-stack pipeline that constructs a Knowledge Graph from raw text, uses hybrid search (dense + sparse + graph traversal) to solve multi-hop reasoning problems in LLMs, and re-ranks results with Reciprocal Rank Fusion and a Cross-Encoder.

Hey everyone, I built an open-source full-stack pipeline (Django + React) that constructs a Knowledge Graph from raw text, detects thematic communities, and uses hybrid search to solve the "lost in the middle" problem in standard vector retrieval. **The Pipeline:** 1. **Ingestion & Chunking:** Raw text is cleaned, parsed, and split into overlapping chunks to preserve local context. 2. **Graph Construction:** `spaCy` extracts named entities from each chunk. A weighted co-occurrence graph is built using `NetworkX`, mapping which entities appear together and linking them to their source chunks. 3. **Community Detection:** The graph is partitioned into thematic clusters using `greedy_modularity_communities`. For each cluster, random text chunks are sampled and sent to an LLM to generate a high-level summary (preventing "hub node" bias). 4. **Indexing:** All chunks are embedded into a dense vector store, and a sparse BM25 index is built over the same corpus. 5. **Hybrid Retrieval:** On query, the system performs a dual search (Dense Vector + BM25). Simultaneously, it extracts entities from the prompt, traverses the graph for 1st-degree neighbors, and retrieves their associated chunks. 6. **Fusion & Reranking:** Local and Global (community summary) results are merged, deduplicated, and scored using **Reciprocal Rank Fusion (RRF)**. The top-K candidates are then re-scored by a Cross-Encoder for maximum precision. 7. **LLM Synthesis:** The final curated context is passed to the LLM with strict prompting to generate a concise, well-structured, and cited answer. **Why it works:** Standard vector search fails at multi-hop queries like: >Who ordered the execution of Sansa's father, and how did that person eventually die? By traversing the graph (*Sansa -> Ned -> Joffrey -> Poisoning*), the system bridges the gap between disconnected text chunks and synthesizes the correct answer. **GitHub:** [https://github.com/mohammad-majoony/graphrag-studio](https://github.com/mohammad-majoony/graphrag-studio) Would love feedback! Thanks.

Original Article

I built an open-source Knowledge Graph pipeline with hybrid retrieval to improve LLM multi-hop reasoning [P]

Similar Articles

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

@hxiao: Not a fan of Knowledge Graphs, but recently I started using them more often for a surprising reason: to build non-trivi…

Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment

LogosKG: Hardware-Optimized Scalable and Interpretable Knowledge Graph Retrieval

Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning

Submit Feedback

Similar Articles

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

@hxiao: Not a fan of Knowledge Graphs, but recently I started using them more often for a surprising reason: to build non-trivi…

Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment

LogosKG: Hardware-Optimized Scalable and Interpretable Knowledge Graph Retrieval

Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning