Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation
Summary
This paper presents a synthetic data generation method for fine-tuning small LLMs to convert natural language to Cypher queries for property graphs, achieving competitive performance with large proprietary models while enabling local deployment and data sovereignty.
View Cached Full Text
Cached at: 06/15/26, 08:58 AM
# Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation Source: [https://arxiv.org/abs/2606.14325](https://arxiv.org/abs/2606.14325) [View PDF](https://arxiv.org/pdf/2606.14325) > Abstract:Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources\. To enable precise access to the information contained in them we need conversational interfaces based on Text\-To\-Cypher \(Text2Cypher\) parsers\. This paper presents an automatic synthetic data generation method that can be leveraged to fine\-tune small LLMs for this task\. We conduct experiments on all the major Text\-To\-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models\. This means that in settings in which models must be locally deployed we can ensure data\-sovereignty without sacrificing accuracy and without costly annotation campaigns\. ## Submission history From: Francesco Cazzaro \[[view email](https://arxiv.org/show-email/fe4d681d/2606.14325)\] **\[v1\]**Fri, 12 Jun 2026 10:08:20 UTC \(719 KB\)
Similar Articles
Construction of Knowledge Graph based on Language Model
Review paper from Kunming University surveys how pre-trained language models automate knowledge-graph construction and introduces LLHKG, a lightweight-LLM framework matching GPT-3.5 performance.
I built an open-source Knowledge Graph pipeline with hybrid retrieval to improve LLM multi-hop reasoning [P]
An open-source full-stack pipeline that constructs a Knowledge Graph from raw text, uses hybrid search (dense + sparse + graph traversal) to solve multi-hop reasoning problems in LLMs, and re-ranks results with Reciprocal Rank Fusion and a Cross-Encoder.
Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment
MetaKGEnrich is a fully automated pipeline that uses graph metrics to detect knowledge gaps in LLM applications, retrieves web evidence, and improves answer quality by 80-87% across three benchmark datasets.
Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation
This paper proposes SGR, a framework that enhances LLM stepwise reasoning by integrating external knowledge graphs through query-relevant subgraph generation, combining Cypher-based reasoning with collaborative reasoning integration. Experiments on CWQ, WebQSP, GrailQA, and KQA Pro show improved reasoning accuracy over standard prompting and knowledge-enhanced baselines.
Generating Logically Consistent Synthetic Supply Chain Data with LLM-Driven Knowledge Graph Reasoning
This paper introduces TabKG, a knowledge-graph-guided framework for generating logically consistent synthetic supply chain tabular data. It uses an LLM ensemble to discover operational dependencies and a latent diffusion model to generate independent columns, achieving high logical consistency while preserving statistical fidelity.