Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

arXiv cs.CL Papers

Summary

This paper presents a synthetic data generation method for fine-tuning small LLMs to convert natural language to Cypher queries for property graphs, achieving competitive performance with large proprietary models while enabling local deployment and data sovereignty.

arXiv:2606.14325v1 Announce Type: new Abstract: Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources. To enable precise access to the information contained in them we need conversational interfaces based on Text-To-Cypher (Text2Cypher) parsers. This paper presents an automatic synthetic data generation method that can be leveraged to fine-tune small LLMs for this task. We conduct experiments on all the major Text-To-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models. This means that in settings in which models must be locally deployed we can ensure data-sovereignty without sacrificing accuracy and without costly annotation campaigns.
Original Article
View Cached Full Text

Cached at: 06/15/26, 08:58 AM

# Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation
Source: [https://arxiv.org/abs/2606.14325](https://arxiv.org/abs/2606.14325)
[View PDF](https://arxiv.org/pdf/2606.14325)

> Abstract:Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources\. To enable precise access to the information contained in them we need conversational interfaces based on Text\-To\-Cypher \(Text2Cypher\) parsers\. This paper presents an automatic synthetic data generation method that can be leveraged to fine\-tune small LLMs for this task\. We conduct experiments on all the major Text\-To\-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models\. This means that in settings in which models must be locally deployed we can ensure data\-sovereignty without sacrificing accuracy and without costly annotation campaigns\.

## Submission history

From: Francesco Cazzaro \[[view email](https://arxiv.org/show-email/fe4d681d/2606.14325)\] **\[v1\]**Fri, 12 Jun 2026 10:08:20 UTC \(719 KB\)

Similar Articles

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

arXiv cs.CL

This paper proposes SGR, a framework that enhances LLM stepwise reasoning by integrating external knowledge graphs through query-relevant subgraph generation, combining Cypher-based reasoning with collaborative reasoning integration. Experiments on CWQ, WebQSP, GrailQA, and KQA Pro show improved reasoning accuracy over standard prompting and knowledge-enhanced baselines.