Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

arXiv cs.CL 06/15/26, 04:00 AM Papers

text-to-cypher knowledge-graph synthetic-data fine-tuning llm property-graph data-generation

Summary

This paper presents a synthetic data generation method for fine-tuning small LLMs to convert natural language to Cypher queries for property graphs, achieving competitive performance with large proprietary models while enabling local deployment and data sovereignty.

arXiv:2606.14325v1 Announce Type: new Abstract: Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources. To enable precise access to the information contained in them we need conversational interfaces based on Text-To-Cypher (Text2Cypher) parsers. This paper presents an automatic synthetic data generation method that can be leveraged to fine-tune small LLMs for this task. We conduct experiments on all the major Text-To-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models. This means that in settings in which models must be locally deployed we can ensure data-sovereignty without sacrificing accuracy and without costly annotation campaigns.

Original Article

View Cached Full Text

Cached at: 06/15/26, 08:58 AM

# Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation
Source: [https://arxiv.org/abs/2606.14325](https://arxiv.org/abs/2606.14325)
[View PDF](https://arxiv.org/pdf/2606.14325)

> Abstract:Property Graphs are rapidly being adopted as database frameworks for representing heterogeneous data sources\. To enable precise access to the information contained in them we need conversational interfaces based on Text\-To\-Cypher \(Text2Cypher\) parsers\. This paper presents an automatic synthetic data generation method that can be leveraged to fine\-tune small LLMs for this task\. We conduct experiments on all the major Text\-To\-Cypher benchmarks, demonstrating that with our synthetic data generation approach we can significantly increase the performance of small LLMs, allowing them to compete with much larger proprietary models\. This means that in settings in which models must be locally deployed we can ensure data\-sovereignty without sacrificing accuracy and without costly annotation campaigns\.

## Submission history

From: Francesco Cazzaro \[[view email](https://arxiv.org/show-email/fe4d681d/2606.14325)\] **\[v1\]**Fri, 12 Jun 2026 10:08:20 UTC \(719 KB\)

Achieving Precise Text-To-Cypher Via Grounded Knowledge Graph Data Generation

Similar Articles

Construction of Knowledge Graph based on Language Model

I built an open-source Knowledge Graph pipeline with hybrid retrieval to improve LLM multi-hop reasoning [P]

Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

Generating Logically Consistent Synthetic Supply Chain Data with LLM-Driven Knowledge Graph Reasoning

Submit Feedback

Similar Articles

Construction of Knowledge Graph based on Language Model

I built an open-source Knowledge Graph pipeline with hybrid retrieval to improve LLM multi-hop reasoning [P]

Enhancing Metacognitive AI: Knowledge-Graph Population with Graph-Theoretic LLM Enrichment

Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation

Generating Logically Consistent Synthetic Supply Chain Data with LLM-Driven Knowledge Graph Reasoning