Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models
Summary
This paper introduces GRiD, a framework that uses diffusion models and reinforcement learning to generate graph-like rules (e.g., cycles, branches) for knowledge graph reasoning, addressing the limitations of existing chain-rule mining methods. Experiments on six benchmarks show competitive performance in KG completion tasks.
View Cached Full Text
Cached at: 06/01/26, 09:24 AM
# Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models Source: [https://arxiv.org/html/2605.30747](https://arxiv.org/html/2605.30747) Haoxiang ChengLaboratory for Big Data and Decision, National University of Defense TechnologyChangshaChina[hx˙chenggfkd@nudt\.edu\.cn](https://arxiv.org/html/2605.30747v1/mailto:hx%CB%[email protected])Yunfei WangNational Key Laboratory of Information Systems Engineering, National University of Defense TechnologyChangshaChina[wangyunfei@nudt\.edu\.cn](https://arxiv.org/html/2605.30747v1/mailto:[email protected]),Chao ChenLaboratory for Big Data and Decision, National University of Defense TechnologyChangshaChina[chenc1997@nudt\.edu\.cn](https://arxiv.org/html/2605.30747v1/mailto:[email protected]),Kewei ChengMicrosoft CorporationRedmondUSA[keweicheng@microsoft\.com](https://arxiv.org/html/2605.30747v1/mailto:[email protected]),Zhipeng LinCollege of Computer Science and Technology, National University of Defense TechnologyChangshaChina[linzhipeng13@nudt\.edu\.cn](https://arxiv.org/html/2605.30747v1/mailto:[email protected]),Haoxuan LiCenter for Data Science, Peking UniversityBeijingChina[hxli@stu\.pku\.edu\.cn](https://arxiv.org/html/2605.30747v1/mailto:[email protected]),Changjun FanLaboratory for Big Data and Decision, National University of Defense TechnologyChangshaChina[fanchangjun@nudt\.edu\.cn](https://arxiv.org/html/2605.30747v1/mailto:[email protected])andShixuan LiuCollege of Computer Science and Technology, National University of Defense TechnologyChangshaChina[szftandy@hotmail\.com](https://arxiv.org/html/2605.30747v1/mailto:[email protected]) \(2026\) ###### Abstract\. Logical rules constitute a cornerstone of knowledge graph \(KG\) reasoning, valued for their interpretability and ability to model relational patterns\. However, existing rule mining methods predominantly focus on simple chain\-like rules and therefore neglect the richer relational information encoded in graph\-like structures, such as cycles and branches\. This limitation is further exacerbated by computational bottlenecks caused by the combinatorial explosion of the search space, which is especially challenging for graph\-like rules\. Meanwhile, generative approaches such as diffusion models, despite their success in other domains, can not be directly applied to rule mining because their training objectives are not aligned with the goal of learning high\-quality rules, and non\-differentiable KG rule quality metrics cannot directly guide model optimization\. To address these limitations, we propose GRiD, a framework that reformulates graph\-like rule discovery as a discrete generative process conditioned on the target relation\. GRiD employs a two\-phase training strategy\. First, supervised pre\-training enables GRiD to capture structural priors from subgraphs sampled from the KG meta\-graph\. Subsequently, reinforcement learning is applied to fine\-tune GRiD through policy gradient optimization guided directly by non\-differentiable rule\-quality metrics\. Experiments on six benchmark datasets show that GRiD achieves competitive performance on KG completion tasks\. Ablation studies confirm the efficiency and robustness of GRiD and further show that graph\-like rules complement chain\-like rules in KG completion\. Our codes and datasets are available in[https://github\.com/Haoxiang\-Cheng/GRiD](https://github.com/Haoxiang-Cheng/GRiD) Knowledge Graph Reasoning, Logical Rules, Diffusion Models, Reinforcement Learning ††journalyear:2026††copyright:cc††conference:Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; August 2026; Korea††booktitle:Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining \(KDD ’26\), August 13–17, 2026, Korea††doi:10\.1145/3770855\.3817814††isbn:978\-1\-4503\-XXXX\-X/2026/08††ccs:Computing methodologies Semantic networks## 1\.Introduction Knowledge graphs \(KGs\) represent knowledge as factual triples\(eh,r,et\)\(e\_\{h\},r,e\_\{t\}\)and serve as critical components in intelligent systems such as semantic search and question answering\(An et al\.,[2024](https://arxiv.org/html/2605.30747#bib.bib2); Saxena et al\.,[2022](https://arxiv.org/html/2605.30747#bib.bib34)\)\. However, KGs are inherently incomplete, necessitating reasoning methods to infer missing facts\. Among various paradigms, rule\-based reasoning offers interpretability by providing explicit logical rules that capture relational dependencies and enable transparent inference\(Ji et al\.,[2022](https://arxiv.org/html/2605.30747#bib.bib16)\)\. A logical rule typically expresses an implication of the formρ:ρh←ρb\\rho:\\rho\_\{h\}\\leftarrow\\rho\_\{b\}, where the rule headρh\\rho\_\{h\}can be inferred from a conjunction of body atomsρb\\rho\_\{b\}\. In practice, rules are often induced from observed path instances, creating a direct link between schema semantics and instance\-level evidence\. For example, as illustrated in Fig\.[1](https://arxiv.org/html/2605.30747#S1.F1), the ruleρ1:BornIn\(x,z\)←WorksAt\(x,y\)∧LocatedIn\(y,z\)\\rho\_\{1\}:\\textit\{BornIn\}\(x,z\)\\leftarrow\\textit\{WorksAt\}\(x,y\)\\wedge\\textit\{LocatedIn\}\(y,z\)is induced from the instance pathBornIn\(Turing, UK\)←\\leftarrowWorksAt\(Turing, Cambridge\)∧\\wedgeLocatedIn\(Cambridge, UK\)\. Such instance\-grounded induction yields transparent and verifiable reasoning paths, highlighting the utility of rule\-based approaches\. Despite their transparency, traditional rule\-based methods are largely confined to chain\-like rules, which are sequences of relations connected by shared variables\(Islam et al\.,[2022](https://arxiv.org/html/2605.30747#bib.bib15)\)\. While chain\-like rules are efficient to mine and apply, their linear structure limits their ability to express richer relational patterns, such as conjunctive or overlapping constraints\. This restriction often leads to ambiguous predictions in KGs with complex schemas\. For instance, applying the chain\-like ruleρ1\\rho\_\{1\}to inferBornIn\(Hinton,??\) may yield multiple candidates, because the rule cannot distinguish among multiple candidate workplaces associated with the subject\. This limitation arises from the inability of chain\-like rules to jointly enforce multi\-relational constraints, highlighting the need for more discriminative rule structures\. Graph\-like rules, whose bodies form directed graphs rather than simple chains, can address this limitation by enforcing multiple relational constraints simultaneously\. For instance, a graph\-like rule such asρ2:BornIn\(x,z\)←WorksAt\(x, y\)∧GraduatedFrom\(x, y\)∧LocatedIn\(y, z\)\\rho\_\{2\}:\\textit\{BornIn\}\(x,z\)\\leftarrow\\textit\{WorksAt\(x, y\)\}\\wedge\\textit\{GraduatedFrom\(x, y\)\}\\wedge\\textit\{LocatedIn\(y, z\)\}incorporates joint evidence through shared variables, thereby reducing ambiguity\. However, extending rule learning from chain\-like rules to general graph\-like structures introduces a fundamental computational bottleneck\. Even mining simple chain\-like rules involves a combinatorial exponential search space, typically on the order ofO\(\|R\|L\)O\(\|R\|^\{L\}\)for lengthLL\. Graph\-like rules, which generalize chains by introducing non\-linear dependencies, further expand this space toO\(\(L\|R\|\)L\)O\(\(L\|R\|\)^\{L\}\)\. This issue is inherent in search\-based paradigms, particularly those extracting rules from instance paths, where traversing large KG instances incurs substantial computational overhead\. Consequently, this challenge motivates a shift from iterative search to more efficient exploration of the rule space, in line with recent studies that formulate graph\-structured combinatorial optimization as learnable optimization procedures\(Pu et al\.,[2024](https://arxiv.org/html/2605.30747#bib.bib28)\)\. Figure 1\.While existing rule\-based methods predominantly rely on chain\-like structures, such structures are insufficient for capturing complex interconnected patterns in KGs\. Graph\-like rules provide a more expressive alternative but are hindered by the exponential growth of the search space\. Therefore, we propose a diffusion\-based method for efficient rule generation\. However, the standard supervised reconstruction objective for these models is misaligned with the goal of discovering semantically high\-quality rules\. We therefore introduce RL fine\-tuning to directly optimize the generation policy using rewards based on structural quality, enabling the efficient generation of high\-quality graph\-like rules\.While search\-based rule discovery is constrained by combinatorial complexity, generative approaches have recently emerged as an alternative\. Diffusion models, in particular, have shown effectiveness in generating structured data while maintaining robustness and diversity\. Unlike Generative Adversarial Networks and Variational Autoencoders, which may suffer from mode collapse or over\-smoothing\(Barannikov et al\.,[2021](https://arxiv.org/html/2605.30747#bib.bib4)\), diffusion models can provide more stable coverage of complex output spaces\. Moreover, while Large Language Models may produce logically invalid symbolic dependencies\(Liu et al\.,[2025](https://arxiv.org/html/2605.30747#bib.bib19)\), diffusion models can incorporate explicit constraints to preserve structural validity\. Nevertheless, existing diffusion\-based methods in the KG domain primarily focus on instance\-level tasks such as fact completion rather than on discovering explicit schema\-level logical patterns\(Long et al\.,[2024b](https://arxiv.org/html/2605.30747#bib.bib21),[a](https://arxiv.org/html/2605.30747#bib.bib20); Zhou et al\.,[2024](https://arxiv.org/html/2605.30747#bib.bib47)\)\. Adapting diffusion models from instance\-level reconstruction to schema\-level rule induction is nontrivial\. A central challenge lies in the non\-differentiable nature of rule quality metrics\. Unlike standard diffusion training, which optimizes differentiable reconstruction losses, logical rules are typically assessed using discrete quality metrics such as confidence and coverage\. These metrics cannot be directly back\-propagated, creating a mismatch between the training objective and the semantic quality of generated rules\. As a result, naively applying diffusion models may fail to align generation with semantically meaningful rules\. To address these challenges, we propose GRiD, a framework for generatingGraph\-likeRules withDiffusion models for KG reasoning\. GRiD reformualtes rule discovery as a conditional discrete diffusion process over the rule\-body adjacency matrix\. To overcome the non\-differentiability of discrete rule metrics, GRiD incorporates Reinforcement Learning \(RL\) to fine\-tune the diffusion process using rule\-quality metrics as reward signals\. However, directly applying RL to the combinatorial space of graph\-like rules often leads to severe sample inefficiency\. To mitigate this issue, GRiD integrates a Supervised Learning \(SL\) pre\-training strategy on subgraphs sampled from the KG meta\-graph, which provides the denoising network with graph structure priors\. By combining SL for structural pattern recognition with RL for quality optimization, GRiD enables the efficient generation of high\-quality graph\-like rules\. The contributions of this work are summarized as follows: - •We propose GRiD, a framework that reformulates rule discovery as a conditional generative process using a diffusion model, shifting this task from discrete search to generative modeling\. - •We introduce a two\-stage training strategy that employs SL pre\-training to learn structure priors and RL fine\-tuning to optimize the diffusion model using non\-differentiable rule\-quality metrics\. - •Experiments show the effectiveness of GRiD for the KGC task, while ablation studies reveal the complementary effect of graph\-like rules and the correlation between their effectiveness and KG structural properties\. ## 2\.Related Work ### 2\.1\.Logical Rule Learning Logical rule learning is one of the main paradigms for KG reasoning, alongside embedding\-based and path\-based methods\(Ji et al\.,[2022](https://arxiv.org/html/2605.30747#bib.bib16)\)\. Compared with embedding\-based and path\-based methods, rule\-based reasoning provides higher interpretability by deriving conclusions through explicit logical formulas\. Rule\-based reasoning applies predefined logical axioms and inference rules to derive new factual knowledge from existing KGs\. Owing to this interpretability, substantial research has focused on developing methods for efficiently discovering high\-quality logical rules from KGs\. Logical rule learning has evolved through three methodological stages\. Early approaches primarily relied on inductive logic programming and association rule mining\. Foundational systems such as FOIL\(Quinlan,[1990](https://arxiv.org/html/2605.30747#bib.bib31)\)and AMIE\(Galárraga et al\.,[2013](https://arxiv.org/html/2605.30747#bib.bib12)\)generate Horn clauses by generalizing from specific instances, while other approaches are tailored to large graphs\(Fan et al\.,[2015](https://arxiv.org/html/2605.30747#bib.bib11)\)\. Subsequent research explored path\-based methods, including AnyBURL\(Meilicke et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib23)\), which sample and evaluate relational paths from KGs to construct logical rules\. While these methods offer interpretability, their dependence on observed instances limits their ability to generalize to semantically valid rules that occur infrequently in the data\. Neural\-symbolic integration paradigms were introduced to address the inefficiencies of discrete search\. Early work in this direction, such as Neural Theorem Provers \(NTPs\)\(Rocktäschel and Riedel,[2017](https://arxiv.org/html/2605.30747#bib.bib32)\), introduced differentiable reasoning mechanisms\. To reduce the computational demands of these approaches, subsequent research introduced optimizations including adaptive path selection\(Minervini et al\.,[2018](https://arxiv.org/html/2605.30747#bib.bib24)\)and dynamic rule subsetting\(Minervini et al\.,[2020](https://arxiv.org/html/2605.30747#bib.bib25)\)\. Despite these improvements, scalability challenges continued to hinder practical deployment\. Consequently, recent research has shifted toward neural\-logic frameworks that reformulate rule learning as a continuous optimization problem\. Neural\-LP\(Yang et al\.,[2017a](https://arxiv.org/html/2605.30747#bib.bib43)\)introduced an attention\-based controller for sequential rule construction, while hybrid architectures such as RNNLogic\(Qu et al\.,[2021](https://arxiv.org/html/2605.30747#bib.bib30)\)jointly learn a rule generator and a reasoning predictor for the iterative refinement of rule embeddings\. NCRL\(Cheng et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib8)\)employs a compositional learning strategy, combining rule components through neural representation learning to improve expressiveness\. Despite these advances, existing rule learning methods still focus mainly on chain\-like structures, thereby overlooking graph\-like structures that can model complex relational patterns in KGs\. Moreover, generating graph\-like rules is computationally challenging due to the exponential growth of the rule space\. To bridge this gap, we leverage diffusion models to generate graph\-like rules that capture complex relational patterns for KG reasoning\. ### 2\.2\.Diffusion Models Diffusion models have demonstrated generative capabilities across a range of domains\(Yang et al\.,[2023b](https://arxiv.org/html/2605.30747#bib.bib45)\)\. They have also been applied to graph\-structured data generation, including molecular design, where they model complex constrained output spaces\(Vignac et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib41); Huang et al\.,[2024](https://arxiv.org/html/2605.30747#bib.bib14); Tseng et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib39)\)\. This ability to model structured patterns under combinatorial constraints makes diffusion models suitable for KG reasoning tasks, which involve discovering valid relational paths within structured graphs\. Recent applications of diffusion models to KG reasoning can be grouped into two trajectories\. The first strand operates at the instance level, reformulating KG reasoning as a conditional generation task in which denoising processes progressively recover target entities or facts\(Long et al\.,[2024b](https://arxiv.org/html/2605.30747#bib.bib21); Chai et al\.,[2024](https://arxiv.org/html/2605.30747#bib.bib7); Qian et al\.,[2024](https://arxiv.org/html/2605.30747#bib.bib29); Yang et al\.,[2023a](https://arxiv.org/html/2605.30747#bib.bib46)\)\. While this paradigm learns the distribution of valid facts and achieves competitive performance, it remains confined to instance\-level prediction and does not explicitly capture schema\-level logical patterns in KGs\. The second trajectory applies diffusion processes in continuous latent spaces to learn fact distributions and model relational uncertainty\(Cai et al\.,[2024](https://arxiv.org/html/2605.30747#bib.bib6)\)\. This line of work improves representational flexibility and can model complex relational patterns\. However, because these methods focus on refining entity representations rather than discovering schema structures, they provide limited interpretability\. While both paradigms demonstrate the adaptability of diffusion models to KG reasoning, neither directly generates explicit rules beyond the instance level, and therefore neither explicitly captures schema\-level logical patterns\. However, directly applying diffusion models to rule generation is challenging because standard reconstruction losses are not aligned with discrete rule\-quality metrics\. To address this issue, we employ RL to optimize the diffusion model using rule\-quality metrics as reward signals, enabling the diffusion process to be applied to graph\-like rule generation\. ## 3\.Preliminaries This section formalizes the key concepts used in this work\. We begin by reviewing the basic structure of KGs and logical rules\. We then introduce the concept of graph\-like rules\. Finally, we present four quantitative metrics for evaluating rule quality\. Definition 1 \(Knowledge Graph, KG\)\.A KG𝒢\\mathcal\{G\}is formally defined as a relation\-labeled directed graph𝒢=\(ℰ,ℛ,ℱ\)\\mathcal\{G\}=\(\\mathcal\{E\},\\mathcal\{R\},\\mathcal\{F\}\), whereℰ\\mathcal\{E\}denotes the entity set,ℛ\\mathcal\{R\}denotes the relation set, andℱ\\mathcal\{F\}denotes the fact set\. Specifically,ℱ=\{\(eh,r,et\)\|eh,et∈ℰ,r∈ℛ\}\\mathcal\{F\}=\\\{\(e\_\{h\},r,e\_\{t\}\)\|e\_\{h\},e\_\{t\}\\in\\mathcal\{E\},r\\in\\mathcal\{R\}\\\}is the set of factual triples\. Definition 2 \(Logical Rule\)\.A logical ruleρ\\rhorepresents a logical implication between a head predicate and a body composed of conjunctive predicates\. It has the following general form: \(1\)ρ:ρh←ρb\\rho:\\rho\_\{h\}\\leftarrow\\rho\_\{b\}whereρh\\rho\_\{h\}denotes the rule head, typically formulated as a single atom such asrh\(ei,ej\)r\_\{h\}\(e\_\{i\},e\_\{j\}\), andρb\\rho\_\{b\}denotes the rule body, which is formed by a conjunction of atoms\. Semantically, the rule asserts that if the bodyρb\\rho\_\{b\}is true, then the headρh\\rho\_\{h\}is also true\. Definition 3 \(Graph\-like Rule\)\.A graph\-like rule is a logical rule whose bodyρb\\rho\_\{b\}forms a directed graph\. As illustrated in Fig\.[2](https://arxiv.org/html/2605.30747#S3.F2), we categorize such rules into four basic types based on the connectivity pattern of the rule body: linear rule \(Fig\.[2\(a\)](https://arxiv.org/html/2605.30747#S3.F2.sf1)\), which represent sequential dependencies; branching structure \(Fig\.[2\(b\)](https://arxiv.org/html/2605.30747#S3.F2.sf2)\), which capture conjunctive paths; node\-cycling structures \(Fig\.[2\(c\)](https://arxiv.org/html/2605.30747#S3.F2.sf3)\), which model node\-centric cycles; and edge\-cycling structures \(Fig\.[2\(d\)](https://arxiv.org/html/2605.30747#S3.F2.sf4)\), which represent relation\-centric cycles\. These basic structures can be combined to form more complex hybrid rule patterns\. \(a\)Linear Rule Body \(b\)Branching Graph Rule Body \(c\)Edge\-cycle Graph Rule Body \(d\)Node\-cycle Graph Rule Body Figure 2\.Four Basic Types of Graph\-like Rule BodiesDefinition 5 \(Support\)\.The support of a logical ruleρ\\rho, denoted asSuppρ𝒢Supp\_\{\\rho\}^\{\\mathcal\{G\}\}, is the number of distinct entity pairs\(ei,ej\)\(e\_\{i\},e\_\{j\}\)in𝒢\\mathcal\{G\}for which bothρb\\rho\_\{b\}andρh\\rho\_\{h\}hold\. \(2\)Suppρ𝒢:=\#\{\(ei,ej\)∈𝒢:𝕀ρb\(ei,ej\)∧ρh\(ei,ej\)\}Supp\_\{\\rho\}^\{\\mathcal\{G\}\}:=\\\#\\\{\(e\_\{i\},e\_\{j\}\)\\in\\mathcal\{G\}:\\mathbb\{I\}\_\{\\rho\_\{b\}\}\(e\_\{i\},e\_\{j\}\)\\land\\rho\_\{h\}\(e\_\{i\},e\_\{j\}\)\\\} Definition 6 \(Coverage\)\.The coverage of a ruleρ\\rhoquantifies its prevalence among facts of the head relation\. It is defined as the proportion of entity pairs connected byρh\\rho\_\{h\}that also satisfyρb\\rho\_\{b\}\. \(3\)Covρ𝒢:=Suppρ𝒢\#\{\(ei,ej\)∈𝒢:ρh\(ei,ej\)\}Cov\_\{\\rho\}^\{\\mathcal\{G\}\}:=\\frac\{Supp\_\{\\rho\}^\{\\mathcal\{G\}\}\}\{\\\#\\\{\(e\_\{i\},e\_\{j\}\)\\in\\mathcal\{G\}:\\rho\_\{h\}\(e\_\{i\},e\_\{j\}\)\\\}\} Definition 7 \(Confidence\)\.The confidence of a ruleρ\\rhomeasures the proportion of body groundings that also support the rule head\. It is defined as follows: \(4\)Confρ𝒢:=Suppρ𝒢\#\{\(ei,ej\)∈𝒢:𝕀ρb\(ei,ej\)\}Conf\_\{\\rho\}^\{\\mathcal\{G\}\}:=\\frac\{Supp\_\{\\rho\}^\{\\mathcal\{G\}\}\}\{\\\#\\\{\(e\_\{i\},e\_\{j\}\)\\in\\mathcal\{G\}:\\mathbb\{I\}\_\{\\rho\_\{b\}\}\(e\_\{i\},e\_\{j\}\)\\\}\} Definition 8 \(PCA Confidence\)\.PCA Confidence quantifies the precision of a ruleρ\\rhounder the Partial Completeness Assumption \(PCA\)\(Galárraga et al\.,[2013](https://arxiv.org/html/2605.30747#bib.bib12)\)\. It restricts the denominator to body groundings whose subject entity has at least one observed fact under the head relation, and is defined as follows: \(5\)PCA\-Confρ𝒢:=Suppρ𝒢\#\{\(ei,ej′\)∈𝒢:𝕀ρb\(ei,ej′\)∧ρh\(ei,ej′\)\}PCA\\text\{\-\}Conf\_\{\\rho\}^\{\\mathcal\{G\}\}:=\\frac\{Supp\_\{\\rho\}^\{\\mathcal\{G\}\}\}\{\\\#\\\{\{\(e\_\{i\},e\_\{j\}^\{\\prime\}\)\\in\\mathcal\{G\}:\\mathbb\{I\}\_\{\\rho\_\{b\}\}\(e\_\{i\},e\_\{j\}^\{\\prime\}\)\\land\\rho\_\{h\}\(e\_\{i\},e\_\{j\}^\{\\prime\}\)\\\}\}\} Figure 3\.The GRiD framework\. GRiD formulates graph\-like rule discovery as a conditional discrete diffusion process over the rule\-body adjacency matrices\. Phase A \(SL pre\-training\) learns graph\-structure priors by corrupting subgraphs sampled from the KG meta\-graph with categorical noise and training a Graph Transformer to reverse the diffusion process conditioned on the target relation\.Phase B \(RL fine\-tuning\)treats the denoising process as a stochastic policy, in which a rule is generated from a randomly initialized noisy graph through reverse diffusion, and the RL reward is computed from rule\-quality metrics based on grounded rule instances in the KG\. After RL fine\-tuning, GRiD performs inference by reverse diffusion from sampled noise to generate top\-ranked graph\-like rules\. ## 4\.Methods This section presents GRiD, a framework that formulates graph\-like rule discovery as a conditional discrete diffusion process over rule\-body adjacency matrices\. By leveraging a diffusion model, GRiDlearns a distribution over KG rules conditioned on the target relation, thereby reducing dependence on explicit combinatorial search\. GRiDoperates in two stages: it is first pre\-trained via SL to capture structural priors from subgraphs sampled from KG meta\-graph and then fine\-tuned with RL to optimize non\-differentiable rule\-quality metrics\. An overview of the pipeline is provided in Fig\.[3](https://arxiv.org/html/2605.30747#S3.F3)\. The following sections detail the discrete diffusion framework, the two\-stage training process, and the inference pipeline\. ### 4\.1\.Discrete Diffusion Framework GRiD adopts a discrete diffusion paradigm to model the symbolic structure of graph\-like rules\. While continuous latent representations may introduce ambiguity when reconstructing such structures, operating in the discrete space preserves their structural constraints\. Accordingly, GRiD reformulates rule discovery as a conditional discrete diffusion process\. The following subsections detail this framework, beginning with the discrete state formulation and followed by formal descriptions of the forward and reverse diffusion processes\. #### 4\.1\.1\.Discrete State Formulation The process is conditioned on the rule headρh\\rho\_\{h\}, which provides the semantic target, while the rule bodyρb\\rho\_\{b\}is modeled as the adjacency matrix to be generated\. Discrete Matrix Representation\.The rule body of graph\-like rules is a directed graph, as defined in Def\.[3](https://arxiv.org/html/2605.30747#S3), and can therefore be represented as a matrix\. Formally, the rule body adjacency matrix𝐀\\mathbf\{A\}is defined as: \(6\)𝐀∈\{0,1,…,\|ℛ\|,∅\}S×S\\mathbf\{A\}\\in\\\{0,1,\\dots,\|\\mathcal\{R\}\|,\\emptyset\\\}^\{S\\times S\}Here,SSdenotes the maximum number of allowed nodes andℛ\\mathcal\{R\}denotes the relation set of the KG\. Each entry𝐀\[i,j\]∈\{0,1,…,\|ℛ\|,∅\}\\mathbf\{A\}\[i,j\]\\in\\\{0,1,\\dots,\|\\mathcal\{R\}\|,\\emptyset\\\}specifies the relation type from nodeiito nodejj, with a value of0indicating no relation and∅\\emptysetindicating padding\. To accommodate variable\-length rules, unused rows and columns are padded with a null token∅\\emptyset, enabling fixed\-dimensional processing for batch training\. Conditional Formulation\.Generation is conditioned by two factors: the target head relationrhr\_\{h\}and the diffusion timesteptt\. The relationrhr\_\{h\}is embedded into a static vector𝐫h∈ℝd\\mathbf\{r\}\_\{h\}\\in\\mathbb\{R\}^\{d\}, whilettis encoded via sinusoidal embeddings into a time\-varying vector𝐭∈ℝd\\mathbf\{t\}\\in\\mathbb\{R\}^\{d\}\. These embeddings are concatenated to form a unified conditioning vector𝐜t\\mathbf\{c\}\_\{t\}: \(7\)𝐜t=\[𝐭;𝐫h\]∈ℝ2d\\mathbf\{c\}\_\{t\}=\[\\mathbf\{t\}\\mathbin\{;\}\\mathbf\{r\}\_\{h\}\]\\in\\mathbb\{R\}^\{2d\}The learning objective is to approximate the conditional distributionpθ\(𝐀\|𝐜t\)p\_\{\\theta\}\(\\mathbf\{A\}\|\\mathbf\{c\}\_\{t\}\), so that the generated rule body is conditioned onrhr\_\{h\}while respecting structural constraints at each diffusion step\. #### 4\.1\.2\.Forward Diffusion Process\. The forward diffusion process progressively corrupts a clean rule\-body adjacency matrix𝐀𝟎\\mathbf\{A\_\{0\}\}into noise overTTtimesteps\. Following the D3PM framework\(Austin et al\.,[2021](https://arxiv.org/html/2605.30747#bib.bib3)\), the forward process is specified by a sequence of categorical transition matrices\{𝐐t\}t=1T\\\{\\mathbf\{Q\}\_\{t\}\\\}\_\{t=1\}^\{T\}\. At each timesteptt, the diffusion process factorizes across all edge positions\. Each entry of𝐀t\\mathbf\{A\}\_\{t\}is represented as aKK\-dimensional one\-hot categorical variable and evolves independently according to a categorical Markov transition: \(8\)q\(𝐀t∣𝐀t−1\)=Cat\(𝐀t∣𝐀t−1𝐐t\)q\(\\mathbf\{A\}\_\{t\}\\mid\\mathbf\{A\}\_\{t\-1\}\)=\\textit\{Cat\}\(\\mathbf\{A\}\_\{t\}\\mid\\mathbf\{A\}\_\{t\-1\}\\mathbf\{Q\}\_\{t\}\)Here,Cat\(𝐀t∣𝐏\)\\textit\{Cat\}\(\\mathbf\{A\}\_\{t\}\\mid\\mathbf\{P\}\)denotes a collection of independent categorical distributions over all edge entries, where each probability vector is given by the corresponding row of𝐏\\mathbf\{P\}, andQt∈ℝK×KQ\_\{t\}\\in\\mathbb\{R\}^\{K\\times K\}is a row\-stochastic transition matrix over theK=\|ℛ\|\+1K=\|\\mathcal\{R\}\|\+1edge types, including the no edge category\. In this work, we adopt a uniform corruption scheme defined as follows: \(9\)𝐐t=\(1−βt\)𝐈\+βt𝟏𝟏⊤K\\mathbf\{Q\}\_\{t\}=\(1\-\\beta\_\{t\}\)\\mathbf\{I\}\+\\beta\_\{t\}\\frac\{\\mathbf\{1\}\\mathbf\{1\}^\{\\top\}\}\{K\}whereβt∈\[0,1\]\\beta\_\{t\}\\in\[0,1\]represents the noise schedule\. Here,𝟏∈ℝK\\mathbf\{1\}\\in\\mathbb\{R\}^\{K\}denotes an all\-ones column vector, and𝟏𝟏⊤∈ℝK×K\\mathbf\{1\}\\mathbf\{1\}^\{\\top\}\\in\\mathbb\{R\}^\{K\\times K\}defines a uniform categorical transition\. Under this construction, each edge is left unchanged with probability1−βt1\-\\beta\_\{t\}and is resampled uniformly from allKKedge types with probabilityβt\\beta\_\{t\}\. Using the Markov property, the marginal distribution at timestepttadmits a closed form as follows: \(10\)q\(𝐀t∣𝐀0\)=Cat\(𝐀t\|𝐀0𝐐\(t\)\),𝐐\(t\)=∏i=1t𝐐i,q\(\\mathbf\{A\}\_\{t\}\\mid\\mathbf\{A\}\_\{0\}\)=\\mathrm\{Cat\}\\\!\\left\(\\mathbf\{A\}\_\{t\}\\;\\middle\|\\;\\mathbf\{A\}\_\{0\}\\mathbf\{Q\}^\{\(t\)\}\\right\),\\quad\\mathbf\{Q\}^\{\(t\)\}=\\prod\_\{i=1\}^\{t\}\\mathbf\{Q\}\_\{i\},which converges to a uniform categorical distribution asttincreases, thereby removing information about the original rule structure\. #### 4\.1\.3\.Reverse Diffusion Process\. This process iteratively denoises a noisy rule structure to reconstruct a clean rule\-body adjacency matrix𝐀0\\mathbf\{A\}\_\{0\}from an initial state𝐀T\\mathbf\{A\}\_\{T\}\. The reverse transitionpθ\(𝐀t−1\|𝐀t,𝐜t\)p\_\{\\theta\}\(\\mathbf\{A\}\_\{t\-1\}\|\\mathbf\{A\}\_\{t\},\\mathbf\{c\}\_\{t\}\)is parameterized by a Graph Transformerfθf\_\{\\theta\}, which takes the noisy adjacency matrix𝐀t\\mathbf\{A\}\_\{t\}and the conditioning vector𝐜t\\mathbf\{c\}\_\{t\}\(Eq\.[7](https://arxiv.org/html/2605.30747#S4.E7)\) as input and outputs categorical logits for all edge types\. To condition generation on the current noise level and target relation,𝐜t\\mathbf\{c\}\_\{t\}is injected into each network layer via Feature\-wise Linear Modulation \(FiLM\)\(Perez et al\.,[2018](https://arxiv.org/html/2605.30747#bib.bib27)\)\. For thell\-th layer, the node representation𝐇\(l\)\\mathbf\{H\}^\{\(l\)\}is modulated as follows: \(11\)𝐇\(l\)=𝜸\(𝐜t\)⊙LayerNorm\(𝐇\(l\)\)\+𝜷\(𝐜t\)\\mathbf\{H\}^\{\(l\)\}=\\boldsymbol\{\\gamma\}\(\\mathbf\{c\}\_\{t\}\)\\odot\\text\{LayerNorm\}\(\\mathbf\{H\}^\{\(l\)\}\)\+\\boldsymbol\{\\beta\}\(\\mathbf\{c\}\_\{t\}\)where⊙\\odotdenotes element\-wise multiplication,𝜸\(⋅\)\\boldsymbol\{\\gamma\}\(\\cdot\)and𝜷\(⋅\)\\boldsymbol\{\\beta\}\(\\cdot\)are learned affine projections of𝐜t\\mathbf\{c\}\_\{t\}\. This modulation enables the message\-passing dynamics to adapt to both the timestepttand the target relationrhr\_\{h\}\. Finally, the modulated representations are projected to produce a probability distribution over theKKedge categories for each edge: \(12\)pθ\(𝐀t−1∣𝐀t,𝐜t\)∝Cat\(𝐀t−1\|fθ\(𝐀t,𝐜t\)\)p\_\{\\theta\}\(\\mathbf\{A\}\_\{t\-1\}\\mid\\mathbf\{A\}\_\{t\},\\mathbf\{c\}\_\{t\}\)\\propto\\textit\{Cat\}\\\!\\left\(\\mathbf\{A\}\_\{t\-1\}\\;\\middle\|\\;f\_\{\\theta\}\(\\mathbf\{A\}\_\{t\},\\mathbf\{c\}\_\{t\}\)\\right\)By iteratively sampling from this distribution, GRiD generates a rule body conditioned on the target relation while preserving structural constraints\. ### 4\.2\.Phase A: SL Pre\-training To obtain a structurally informed initialization, we pre\-train GRiD to reconstruct subgraphs sampled from the KG meta\-graph\. The subgraph training set is constructed through random\-walk sampling and topological augmentation on the KG, as detailed in Appendix[I](https://arxiv.org/html/2605.30747#A9)\. Training Objective\.The objective is to train the denoising networkfθf\_\{\\theta\}to recover the clean adjacency matrix𝐀0\\mathbf\{A\}\_\{0\}from a corrupted state\. Specifically, given a noisy input𝐀t\\mathbf\{A\}\_\{t\}and the conditioning vector𝐜t\\mathbf\{c\}\_\{t\}, we minimize a weighted cross\-entropy loss between the predicted edge\-type distribution and the clean adjacency matrix: \(13\)ℒrecon=−∑i,jMij∑k=0K−1𝕀\(𝐀0\[i,j\]=k\)logpθ\(𝐀0\[i,j\]=k∣𝐀t,𝐜t\)∑i,jMij\\mathcal\{L\}\_\{recon\}=\-\\frac\{\\sum\_\{i,j\}M\_\{ij\}\\sum\_\{k=0\}^\{K\-1\}\\mathbb\{I\}\(\\mathbf\{A\}\_\{0\}\[i,j\]=k\)\\log p\_\{\\theta\}\\left\(\\mathbf\{A\}\_\{0\}\[i,j\]=k\\mid\\mathbf\{A\}\_\{t\},\\mathbf\{c\}\_\{t\}\\right\)\}\{\\sum\_\{i,j\}M\_\{ij\}\}Here,𝐌∈\{0,1\}S×S\\mathbf\{M\}\\in\\\{0,1\\\}^\{S\\times S\}is a binary mask, whereMij=0M\_\{ij\}=0for padding positions andMij=1M\_\{ij\}=1otherwise\. This ensures that the optimization focuses only on valid structural entries\. In addition to the reconstruction loss, we use the diffusion model to predict a scalar scorev^ρi\\hat\{v\}\_\{\\rho\_\{i\}\}for each rule\. The overall supervised pre\-training objective is therefore defined as \(14\)ℒSL=ℒrecon\+λQ1B∑i=1B\(v^ρi−vρi∗\)2\\mathcal\{L\}\_\{SL\}=\\mathcal\{L\}\_\{recon\}\+\\lambda\_\{Q\}\\frac\{1\}\{B\}\\sum\_\{i=1\}^\{B\}\\left\(\\hat\{v\}\_\{\\rho\_\{i\}\}\-v\_\{\\rho\_\{i\}\}^\{\*\}\\right\)^\{2\}whereλQ=0\.1\\lambda\_\{Q\}=0\.1is a fixed hyperparameter that balances structural reconstruction and semantic regularization\. The target quality scorevρi∗v\_\{\\rho\_\{i\}\}^\{\*\}is defined as the min\-max normalized sum of coverage and confidence \(Covρi𝒢\+Confρi𝒢Cov\_\{\\rho\_\{i\}\}^\{\\mathcal\{G\}\}\+Conf\_\{\\rho\_\{i\}\}^\{\\mathcal\{G\}\}\) over the training set, thereby accounting for both rule recall and precision\. This auxiliary objective aligns the learned representations with rule\-quality signals and provides a warm start for subsequent RL fine\-tuning\. Training Pipeline\.Each supervised pre\-training iteration corresponds to a single diffusion step conditioned on the target relation\. Concretely, we first sample a batch of rule structures\{\(𝐀0i,rh\)\}i=1B\\\{\(\\mathbf\{A\}^\{i\}\_\{0\},r\_\{h\}\)\\\}\_\{i=1\}^\{B\}from the training set, whereBBdenotes the batch size\. A diffusion timestepttis then drawn uniformly from\{1,…,T\}\\\{1,\\dots,T\\\}, and the corresponding noisy structure𝐀t\\mathbf\{A\}\_\{t\}is obtained through the forward diffusion process\. Conditioned on the vector𝐜t\\mathbf\{c\}\_\{t\}, GRiD predicts both the edge\-type distributions and an auxiliary quality score for the corrupted structure\. The supervised lossℒSL\\mathcal\{L\}\_\{SL\}is then computed and back\-propagated to update the model parametersθ\\theta\. ### 4\.3\.Phase B: RL Fine\-tuning While SL pre\-training establishes the structural prior from KG subgraphs, it does not directly optimize rule\-quality metrics such as coverage and confidence\. Inspired by a recent study\(Kou et al\.,[2026](https://arxiv.org/html/2605.30747#bib.bib18)\), the pre\-trained model is fine\-tuned using RL to address this mismatch, which frames the sequential denoising process as a policy optimization problem\. Policy and Reward Formulation\.We formulate the denoising process as a Markov Decision Process, where the denoising Graph Transformer acts as a stochastic policyπθ\(𝐀t−1∣𝐀t,𝐜t\)\\pi\_\{\\theta\}\(\\mathbf\{A\}\_\{t\-1\}\\mid\\mathbf\{A\}\_\{t\},\\mathbf\{c\}\_\{t\}\)\. The state corresponds to the noisy rule\-body matrix𝐀t\\mathbf\{A\}\_\{t\}, and the action is the sampled denoised matrix𝐀t−1\\mathbf\{A\}\_\{t\-1\}\. Since intermediate noisy states lack direct rule\-quality labels, the rewardR\(ρ\)R\(\\rho\)is assigned only upon generating the complete ruleρ\\rho\. To quantify rule quality, we defineR\(ρ\)R\(\\rho\)as the geometric mean of coverage \(Eq\.[3](https://arxiv.org/html/2605.30747#S3.E3)\), confidence \(Eq\.[4](https://arxiv.org/html/2605.30747#S3.E4)\), and PCA confidence \(Eq\.[5](https://arxiv.org/html/2605.30747#S3.E5)\): \(15\)R\(ρ\)=\(Covρ𝒢⋅Confρ𝒢⋅PCA\-Confρ𝒢\)1/3R\(\\rho\)=\\left\(Cov\_\{\\rho\}^\{\\mathcal\{G\}\}\\cdot Conf\_\{\\rho\}^\{\\mathcal\{G\}\}\\cdot PCA\\text\{\-\}Conf\_\{\\rho\}^\{\\mathcal\{G\}\}\\right\)^\{1/3\}This composite metric balances rule generality and predictive precision, providing a reward signal for optimization\. Policy Optimization\.Since the intermediate states do not have direct rule\-quality labels, we formulate the objective as maximizing the expected episode\-level rewardJ\(θ\)=𝔼ρ∼πθ\[R\(ρ\)\]J\(\\theta\)=\\mathbb\{E\}\_\{\\rho\\sim\\pi\_\{\\theta\}\}\[R\(\\rho\)\]using the REINFORCE algorithm\. To reduce variance, we incorporate a baselinebb, updated by an exponential moving average, to compute the advantageA^=R\(ρ\)−b\\hat\{A\}=R\(\\rho\)\-b\. The optimization objectiveℒRL\\mathcal\{L\}\_\{RL\}combines the policy gradient with entropy regularization to discourage premature convergence to deterministic transitions and encourage exploration: \(16\)ℒRL\(θ\)=−1B∑i=1B∑t=1T\(A^ilogπθ\(𝐀t−1\(i\)∣𝐀t\(i\),𝐜t\(i\)\)\+λHℋ\(πθ\(⋅∣𝐀t\(i\),𝐜t\(i\)\)\)\)\\displaystyle\\mathcal\{L\}\_\{RL\}\(\\theta\)=\-\\frac\{1\}\{B\}\\sum\_\{i=1\}^\{B\}\\sum\_\{t=1\}^\{T\}\\left\(\\hat\{A\}\_\{i\}\\log\\pi\_\{\\theta\}\(\\mathbf\{A\}\_\{t\-1\}^\{\(i\)\}\\mid\\mathbf\{A\}\_\{t\}^\{\(i\)\},\\mathbf\{c\}\_\{t\}^\{\(i\)\}\)\+\\lambda\_\{H\}\\mathcal\{H\}\(\\pi\_\{\\theta\}\(\\cdot\\mid\\mathbf\{A\}\_\{t\}^\{\(i\)\},\\mathbf\{c\}\_\{t\}^\{\(i\)\}\)\)\\right\) Here,A^i\\hat\{A\}\_\{i\}acts as a trajectory\-level weight for episodeii, reinforcing steps that lead to higher\-quality rules, while the entropy termℋ\\mathcal\{H\}encourages exploration at each timestep\. ### 4\.4\.Inference Pipeline After training, the diffusion model is used as a conditional generator of graph\-like rule bodies for a given target relationrhr\_\{h\}\. Inference proceeds via iterative denoising followed by structural parsing and filtering\. Specifically, we initialize the process from a noisy adjacency matrix𝐀T\\mathbf\{A\}\_\{T\}, where each entry is sampled from theKKedge categories\. Fort=T,…,1t=T,\\dots,1, the model iteratively samples \(17\)𝐀t−1∼pθ\(𝐀t−1∣𝐀t,𝐜t\),\\mathbf\{A\}\_\{t\-1\}\\sim p\_\{\\theta\}\(\\mathbf\{A\}\_\{t\-1\}\\mid\\mathbf\{A\}\_\{t\},\\mathbf\{c\}\_\{t\}\),where the conditioning vector𝐜t=\[𝐭;𝐫h\]\\mathbf\{c\}\_\{t\}=\[\\mathbf\{t\}\\mathbin\{;\}\\mathbf\{r\}\_\{h\}\]encodes the timestep and target relation\. This denoising process yields a discrete adjacency matrix𝐀0\\mathbf\{A\}\_\{0\}, which is then parsed into a graph\-like rule body\. To form a rule, the node with zero in\-degree is identified as the head variableehe\_\{h\}\. The node with zero out\-degree that is topologically farthest fromehe\_\{h\}is designated as the tail variableete\_\{t\}\. The remaining edges compose the rule body that predictsrh\(eh,et\)r\_\{h\}\(e\_\{h\},e\_\{t\}\)\. Finally, KG structural constraints are applied to filter invalid structures, retaining valid graph\-like rules for downstream reasoning\. ## 5\.Experiment ### 5\.1\.Experimental settings Table 1\.KGC results of GRiD on six datasets\. The best/second\-best metrics are bolded/underlined\. All reported results are averaged over five independent runs with different random seeds, detailed in Table[7](https://arxiv.org/html/2605.30747#A7.T7)in Appendix[G](https://arxiv.org/html/2605.30747#A7)\.ModelsFamilyKinshipUMLSMRR \(%\)HIT@1 \(%\)HIT@10 \(%\)MRR \(%\)HIT@1 \(%\)HIT@10 \(%\)MRR \(%\)HIT@1 \(%\)HIT@10 \(%\)EmbeddingTransE45\.2122\.3487\.4531\.230\.9383\.2269\.8652\.1489\.19DistMult54\.1736\.0088\.4535\.2319\.2274\.2339\.0425\.6166\.15ComplEx81\.1172\.1293\.6342\.1223\.5682\.1141\.2927\.9370\.92RotatE86\.7077\.4393\.2166\.2150\.2494\.1374\.2163\.9294\.04SpeedE90\.2882\.7298\.4765\.3253\.3589\.5570\.3558\.6289\.65GNN\-basedCompGCN75\.3558\.8388\.8340\.3026\.9475\.8362\.3350\.8572\.76NBFNet84\.1676\.0397\.7149\.5434\.4381\.0770\.7464\.1982\.48A\*Net82\.3572\.3796\.6744\.3630\.4277\.3666\.3753\.6378\.37Rule\-basedAnyBURL93\.9988\.4694\.9065\.9352\.3891\.4568\.8367\.4072\.54AMIE87\.7480\.4798\.8362\.9948\.8390\.0664\.4260\.6774\.21Neural\-LP88\.1279\.2898\.5330\.1716\.6059\.3448\.3833\.2877\.47DRUM89\.0382\.7699\.0633\.0218\.1167\.7655\.1835\.3785\.15RNNLogic86\.7479\.4095\.1864\.7849\.7092\.2675\.2062\.4092\.04RLogic88\.0880\.8197\.2757\.7043\.6087\.0871\.7455\.3293\.55NCRL91\.3385\.2499\.3164\.3249\.3992\.5675\.5364\.4493\.15ChatRule90\.6285\.4596\.8432\.5217\.5168\.7477\.5167\.4594\.84GRiD95\.2591\.5498\.3267\.1854\.3692\.4683\.4874\.8996\.48ModelWN\-18RRFB15K\-237YAGO3\-10MRR \(%\)HIT@1 \(%\)HIT@10 \(%\)MRR \(%\)HIT@1 \(%\)HIT@10 \(%\)MRR \(%\)HIT@1 \(%\)HIT@10 \(%\)EmbeddingTransE23\.122\.2352\.5429\.4018\.6546\.8436\.4225\.7858\.01DistMult42\.6738\.0650\.3822\.5313\.9338\.4634\.5624\.5452\.93ComplEx44\.1141\.3551\.1524\.4915\.2641\.6733\.3424\.2353\.86RotatE47\.1742\.9255\.7432\.4022\.2452\.4849\.4140\.7667\.70SpeedE49\.3144\.5257\.6332\.7122\.3249\.5741\.3533\.2551\.35GNN\-basedCompGCN45\.3239\.2554\.6333\.7422\.9852\.5528\.4321\.3647\.52NBFNet53\.8749\.5062\.9339\.2228\.7957\.3936\.8327\.9155\.42A\*Net51\.1842\.8264\.3739\.8830\.8958\.1336\.1627\.3655\.93Rule\-basedAnyBURL41\.2037\.9147\.9041\.3230\.9761\.4046\.1439\.6058\.52AMIE36\.0233\.4241\.00\-\-\-\-\-\-Neural\-LP37\.6736\.4640\.1524\.1517\.7136\.56\-\-\-DRUM36\.6838\.4141\.3022\.8917\.2335\.75\-\-\-RNNLogic46\.2841\.4053\.3228\.5920\.2044\.62\-\-RLogic44\.2340\.1450\.0131\.1019\.7850\.0236\.2325\.2350\.25NCRL46\.6844\.6453\.2229\.7620\.2447\.4638\.3327\.8753\.14ChatRule33\.5130\.1240\.4330\.9122\.3249\.8844\.9235\.4162\.70GRiD45\.1040\.2254\.3942\.2932\.0562\.1354\.9149\.7862\.97 Tasks\.We evaluate the performance and efficiency of GRiD on KGC tasks\. The objective is to predict a missing target entityete\_\{t\}for a given query\(eh,rq,?\)\(e\_\{h\},r\_\{q\},?\)\. KGC tasks assess the capability of GRiDto extract high\-quality logical rules for KG reasoning\. The full KGC pipeline is detailed in Appendix[D](https://arxiv.org/html/2605.30747#A4)\. Datasets\.Our evaluation spans six benchmark datasets representing diverse scales and domains: two large\-scale KGs, YAGO3\-10\(Suchanek et al\.,[2007](https://arxiv.org/html/2605.30747#bib.bib35)\)and FB15K\-237\(Toutanova and Chen,[2015](https://arxiv.org/html/2605.30747#bib.bib37)\), alongside four domain\-specific datasets: Family\(Hinton,[1986](https://arxiv.org/html/2605.30747#bib.bib13)\), Kinship\(Kok and Domingos,[2007](https://arxiv.org/html/2605.30747#bib.bib17)\), UMLS\(Kok and Domingos,[2007](https://arxiv.org/html/2605.30747#bib.bib17)\), and WN\-18RR\(Dettmers et al\.,[2018](https://arxiv.org/html/2605.30747#bib.bib10)\)\. Detailed statistics are provided in Table[4](https://arxiv.org/html/2605.30747#A1.T4)in Appendix[A](https://arxiv.org/html/2605.30747#A1)\. Dataset Preparation\.Each dataset is randomly split into training and test sets with a ratio of 9:1\. To avoid information leakage, all test triples are strictly excluded from the training data\. Baselines\.We compare GRiD against 16 representative baselines spanning three paradigms under identical experimental settings: five embedding approaches, including TransE\(Bordes et al\.,[2013](https://arxiv.org/html/2605.30747#bib.bib5)\), DistMult\(Yang et al\.,[2015](https://arxiv.org/html/2605.30747#bib.bib42)\), ComplEx\(Trouillon et al\.,[2016](https://arxiv.org/html/2605.30747#bib.bib38)\), RotatE\(Sun et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib36)\), and SpeedE\(Pavlović and Sallinger,[2024](https://arxiv.org/html/2605.30747#bib.bib26)\); three GNN\-based methods, including CompGCN\(Vashishth et al\.,[2020](https://arxiv.org/html/2605.30747#bib.bib40)\), NBFNet\(Zhu et al\.,[2021](https://arxiv.org/html/2605.30747#bib.bib49)\), and A∗Net\(Zhu et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib48)\); and eight rule\-based systems, including AnyBURL\(Meilicke et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib23)\), AMIE\(Galárraga et al\.,[2013](https://arxiv.org/html/2605.30747#bib.bib12)\), Neural\-LP\(Yang et al\.,[2017a](https://arxiv.org/html/2605.30747#bib.bib43)\), DRUM\(Sadeghian et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib33)\), RNNLogic\(Qu et al\.,[2021](https://arxiv.org/html/2605.30747#bib.bib30)\), RLogic\(Cheng et al\.,[2022](https://arxiv.org/html/2605.30747#bib.bib9)\), NCRL\(Cheng et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib8)\), and ChatRule\(Luo et al\.,[2025](https://arxiv.org/html/2605.30747#bib.bib22)\)\. Detailed descriptions of these baselines are provided in Appendix[B](https://arxiv.org/html/2605.30747#A2)\. Metrics\.Following standard KGC evaluation protocols, we adopt filtered ranking, where triples\(eh,rq,e⋆\)\(e\_\{h\},r\_\{q\},e^\{\\star\}\)withe⋆≠ete^\{\\star\}\\neq e\_\{t\}are removed from the candidate set if they exist in the KG\. We report Hit@1, Hit@10, and Mean Reciprocal Rank \(MRR\)\. Rule Application\.For each relationrr, both chain\-like and graph\-like rules are used to compute the final scoring matrix,𝐒r=\(1−α\),𝐒rchain\+α,𝐒rgraph\\mathbf\{S\}\_\{r\}=\(1\-\\alpha\),\\mathbf\{S\}^\{\\mathrm\{chain\}\}\_\{r\}\+\\alpha,\\mathbf\{S\}^\{\\mathrm\{graph\}\}\_\{r\}, where the fusion weight isα=0\.1\\alpha=0\.1\. The detailed scoring and aggregation procedure is described in Appendix[D](https://arxiv.org/html/2605.30747#A4)\. ### 5\.2\.Knowledge Graph Completion Results Overall Results\.As summarized in Table[1](https://arxiv.org/html/2605.30747#S5.T1), GRiD achieves consistently strong KGC performance across six benchmarks\. It outperforms most rule\-based baselines in the majority of settings and remains competitive with representative embedding and GNN\-based baselines, while producing explicit logical rules\. On large\-scale KGs such as FB15K\-237 and YAGO3\-10, several rule\-based approaches encounter scalability limitations, whereas GRiD still achieves the best or second\-best results\. Impact of KG Structure\.We observe that graph\-like rules are particularly beneficial on KGs where rich relational structure leads to ambiguous predictions under simple chain\-based reasoning, such as FB15K\-237 and YAGO3\-10\. In such settings, chain\-like rules often yield ambiguous candidate rankings, while graph\-like rules introduce additional conjunctive constraints that help disambiguate candidate entities\. Conversely, on KGs with tightly constrained relational semantics \(e\.g\., kinship or lexical hierarchies\), chain‑like rules already capture most reliable patterns, leaving less room for improvement from graph‑like rules\. A detailed quantitative analysis of KG structural properties and their relationship to performance is provided in Appendix[H](https://arxiv.org/html/2605.30747#A8)\. Case Study\.To further illustrate the differences between the learned logical rules, representative examples of chain\-like and graph\-like rules are presented in Table[5](https://arxiv.org/html/2605.30747#A1.T5)\. Graph\-like rules impose stricter conjunctive constraints by requiring the joint satisfaction of multiple relational conditions\. For example, the ruleisCitizenOf\(x,y\)←LivesIn\(x,y\)∧WorksAt\(x,y\)\\textit\{isCitizenOf\}\(x,y\)\\leftarrow\\textit\{LivesIn\}\(x,y\)\\wedge\\textit\{WorksAt\}\(x,y\)integrates two complementary relational signals through the shared variables\(x,y\)\(x,y\)\. In contrast, chain\-like rules are characterized by a single linear relational dependency\. The complementary effects of these rule types are quantitatively examined through ablation studies on rule type and further elaborated in Appendix[F](https://arxiv.org/html/2605.30747#A6)\. ### 5\.3\.Ablation Studies We conduct ablation studies to assess several design choices of GRiD, focusing on four key questions:\(Q1\) Effectiveness of Graph\-like Rules:Do graph\-like rules provide complementary inferential evidence beyond chain\-like rules?\(Q2\) Sensitivity to Fusion Weight:Under which fusion weights do graph\-like rules provide the largest performance gains?\(Q3\) Necessity of RL:Is RL fine\-tuning essential for rule discovery?\(Q4\) Size of Graph\-like Rules:Is a small graph size sufficient for complex KGs? \(Q1\) Effectiveness of Graph\-like Rules\.To address Q1, we investigate whether graph\-like rules provide complementary inferential evidence beyond chain\-like rules\. We compare three configurations:*Chain\-only*,*Graph\-only*, and their*Combination*, while maintaining a constant total number of rules for a controlled comparison\. As shown in Table[2](https://arxiv.org/html/2605.30747#S5.T2), the combined configuration consistently outperforms either individual type, indicating that graph‑like rules capture relational patterns distinct from those encoded by chain‑like rules\. Graph\-like rules in isolation achieve lower KGC performance because their stricter constraints reduce coverage in sparse KGs, which is consistent with the sparsity of KG observations\. Table 2\.KGC Results of Different Rule Types\(Q2\) Sensitivity to Fusion Weight\.To address Q2, we investigate the fusion weights under which graph\-like rules complement chain\-like rules\. While chain‑like rules capture the majority of effective patterns, adding a small proportion of graph‑like rules consistently improves the chain‑only baseline\. As illustrated in Fig\.[4](https://arxiv.org/html/2605.30747#S5.F4), performance gains are observed when graph\-like rules are assigned a constrained fusion weight, approximately0\.10\.1\-0\.20\.2\. Conversely, assigning excessive weight to graph\-like rules degrades performance because of their lower rule coverage\. These findings indicate that graph\-like rules are most effective when incorporated as a controlled complementary signal rather than as the dominant reasoning paradigm\. \(a\)Family MRR \(b\)WN\-18RR MRR \(c\)FB15K\-237 MRR \(d\)YAGO3\-10 MRR Figure 4\.Ablation study on the graph–chain fusion weightα\\alpha\(Q3\) Necessity of RL\.To address Q3, we compare GRiD trained with only supervised pre\-training against two RL\-fine\-tuned variants on YAGO3\-10: one using a geometric\-mean reward and the other using the sum of rule\-quality metrics\. As shown in Fig\.[5](https://arxiv.org/html/2605.30747#S5.F5), RL fine\-tuning yields consistent improvements of approximately 10% across all metrics, indicating that direct optimization toward rule quality helps align generation with reasoning objectives\. Moreover, the geometric\-mean reward outperforms metric summation, suggesting that balanced multi\-criteria optimization provides a more effective training signal in this setting\. Figure 5\.Ablation on RL\(Q4\) Size of Graph\-like Rules\.To address Q4, we conduct KGC experiments with different graph\-size limitsSS\. As shown in Table[3](https://arxiv.org/html/2605.30747#S5.T3),S=6S=6consistently achieves the best MRR across all datasets\. This result is consistent with the compositional nature of logical rules: longer rules often provide limited additional utility because they can be expressed by composing shorter rules\. Moreover, rules operate at the schema level and are therefore not related to instance\-level neighbor density\. Consequently, a smallSScan improve efficiency while preserving sufficient expressiveness for the rule patterns considered in this work\. Table 3\.KGC Results \(MRR\) under differentSS\. ### 5\.4\.Efficiency Analysis We assess the efficiency of GRiD from both theoretical and empirical perspectives\. Theoretically, the computational complexity is dominated by the Graph Transformer in the denoising network and scales quadratically with the rule sizeSS\. Since graph\-like rule bodies are compact \(typicallyS≤6S\\leq 6\), this overhead remains limited in practice \(see Appendix[E](https://arxiv.org/html/2605.30747#A5)for the detailed derivation\)\. Empirically, as reported in Table[6](https://arxiv.org/html/2605.30747#A4.T6), GRiD requires limited training time and GPU memory\. On YAGO3\-10, SL pre\-training and RL fine\-tuning require 1\.34 s and 18\.76 s per epoch, respectively\. Peak GPU memory consumption is 11\.41 GB on FB15K\-237 and 5\.52 GB on YAGO3\-10\. Once rules are generated offline, inference proceeds via sparse\-matrix operations, which support scalable rule application on large\-scale KGs\. ## 6\.Conclusion This paper introduces GRiD, a framework that reformulates rule discovery as a discrete diffusion process conditioned on the target relation\. By shifting from iterative search to generative modeling, GRiD reduces the dependence on exhaustive search in the combinatorial rule space\. By combining SL pre\-training for structural priors with RL fine\-tuning that directly optimizes rule quality metrics, GRiD generates graph\-like rules for KG reasoning\. Experiments show that the rules generated by GRiD support KGC across six benchmark datasets\. Furthermore, ablation studies show the complementary effects of graph\-like rules and chain\-like rules\. Future research will explore generating rules directly from underlying KG distributions to mitigate sampling bias and developing structural evaluation metrics that provide more precise guidance for the diffusion process\. ## 7\.Acknowledgements This study was supported by the National Natural Science Foundation of China \(NSFC, 72571284, 72421002\), Hunan Provincial Fund for Distinguished Young Scholars \(2025JJ20073\), Science and Technology Innovation Program of Hunan Province \(2023RC3009\), Innovation Research Foundation of National University of Defense Technology \(JS24\-05\), and the Major Science and Technology Projects in Changsha, China \(kq2301008\)\. ## References - \(1\) - An et al\.\(2024\)Yuan An, Jane Greenberg, Fernando J\. Uribe\-Romo, Diego A\. Gómez\-Gualdrón, Kyle Langlois, Jacob Furst, Alex Kalinowski, Xintong Zhao, and Xiaohua Hu\. 2024\.Knowledge Graph Question Answering for Materials Science \(KGQA4MAT\)\. In*Proceedings of the 17th Research Conference on Metadata and Semantic Research \(MTSR\)**\(Communications in Computer and Information Science, Vol\. 2048\)*\. Springer, 18–29\.[https://doi\.org/10\.1007/978\-3\-031\-65990\-4\_2](https://doi.org/10.1007/978-3-031-65990-4_2) - Austin et al\.\(2021\)Jacob Austin, Daniel D\. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg\. 2021\.Structured Denoising Diffusion Models in Discrete State\-Spaces\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 34\. 17981–17993\.[https://proceedings\.neurips\.cc/paper/2021/hash/958c530554f78bcd8e97125b70e6973d\-Abstract\.html](https://proceedings.neurips.cc/paper/2021/hash/958c530554f78bcd8e97125b70e6973d-Abstract.html) - Barannikov et al\.\(2021\)Serguei Barannikov, Ilya Trofimov, Grigorii Sotnikov, Ekaterina Trimbach, Alexander Korotin, Alexander Filippov, and Evgeny Burnaev\. 2021\.Manifold Topology Divergence: A Framework for Comparing Data Manifolds\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 34\. 7294–7305\.[https://proceedings\.neurips\.cc/paper/2021/hash/3bc31a430954d8326605fc690ed22f4d\-Abstract\.html](https://proceedings.neurips.cc/paper/2021/hash/3bc31a430954d8326605fc690ed22f4d-Abstract.html) - Bordes et al\.\(2013\)Antoine Bordes, Nicolas Usunier, Alberto Garcia\-Duran, Jason Weston, and Oksana Yakhnenko\. 2013\.Translating Embeddings for Modeling Multi\-Relational Data\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 26\. 2787–2795\.[https://proceedings\.neurips\.cc/paper/5071\-translating\-embeddings\-for\-modeling\-multi\-relational\-data](https://proceedings.neurips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data) - Cai et al\.\(2024\)Yuxiang Cai, Qiao Liu, Yanglei Gan, Changlin Li, Xueyi Liu, Run Lin, Da Luo, and JiayeYang JiayeYang\. 2024\.Predicting the Unpredictable: Uncertainty\-Aware Reasoning over Temporal Knowledge Graphs via Diffusion Process\. In*Findings of the Association for Computational Linguistics: ACL 2024 \(ACL Findings\)*\. Association for Computational Linguistics, 5766–5778\.[https://aclanthology\.org/2024\.findings\-acl\.343/](https://aclanthology.org/2024.findings-acl.343/) - Chai et al\.\(2024\)Haoye Chai, Tong Li, Fenyu Jiang, Shiyuan Zhang, and Yong Li\. 2024\.Knowledge Guided Conditional Diffusion Model for Controllable Mobile Traffic Generation\. In*Companion Proceedings of the ACM Web Conference \(WWW\)*\. ACM, 851–854\.[https://doi\.org/10\.1145/3589335\.3651530](https://doi.org/10.1145/3589335.3651530) - Cheng et al\.\(2023\)Kewei Cheng, Nesreen K\. Ahmed, and Yizhou Sun\. 2023\.Neural Compositional Rule Learning for Knowledge Graph Reasoning\. In*Proceedings of the International Conference on Learning Representations \(ICLR\)*\.[https://openreview\.net/forum?id=F8VKQyDgRVj](https://openreview.net/forum?id=F8VKQyDgRVj) - Cheng et al\.\(2022\)Kewei Cheng, Jiahao Liu, Wei Wang, and Yizhou Sun\. 2022\.RLogic: Recursive Logical Rule Learning from Knowledge Graphs\. In*Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining \(KDD\)*\. ACM, 179–189\.[https://doi\.org/10\.1145/3534678\.3539421](https://doi.org/10.1145/3534678.3539421) - Dettmers et al\.\(2018\)Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel\. 2018\.Convolutional 2D Knowledge Graph Embeddings\. In*Proceedings of the AAAI Conference on Artificial Intelligence \(AAAI\)*, Vol\. 32\.[https://doi\.org/10\.1609/aaai\.v32i1\.11573](https://doi.org/10.1609/aaai.v32i1.11573) - Fan et al\.\(2015\)Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu\. 2015\.Association Rules with Graph Patterns\.*Proceedings of the VLDB Endowment \(PVLDB\)*8, 12 \(2015\), 1502–1513\.[https://doi\.org/10\.14778/2824032\.2824048](https://doi.org/10.14778/2824032.2824048) - Galárraga et al\.\(2013\)Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek\. 2013\.AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases\. In*Proceedings of the 22nd International Conference on World Wide Web \(WWW\)*\. ACM, 413–422\.[https://doi\.org/10\.1145/2488388\.2488425](https://doi.org/10.1145/2488388.2488425) - Hinton \(1986\)Geoffrey E\. Hinton\. 1986\.Learning Distributed Representations of Concepts\. In*Proceedings of the Eighth Annual Conference of the Cognitive Science Society*\. Lawrence Erlbaum Associates, 1–12\.[https://escholarship\.org/uc/item/8xz8n1t9](https://escholarship.org/uc/item/8xz8n1t9) - Huang et al\.\(2024\)Han Huang, Leilei Sun, Bowen Du, and Weifeng Lv\. 2024\.Learning Joint 2\-D and 3\-D Graph Diffusion Models for Complete Molecule Generation\.*IEEE Transactions on Neural Networks and Learning Systems*35, 9 \(2024\), 11857–11871\.[https://doi\.org/10\.1109/TNNLS\.2024\.3416328](https://doi.org/10.1109/TNNLS.2024.3416328) - Islam et al\.\(2022\)Md Kamrul Islam, Sabeur Aridhi, and Malika Smail\-Tabbone\. 2022\.Negative Sampling and Rule Mining for Explainable Link Prediction in Knowledge Graphs\.*Knowledge\-Based Systems*250 \(2022\), 109083\.[https://doi\.org/10\.1016/j\.knosys\.2022\.109083](https://doi.org/10.1016/j.knosys.2022.109083) - Ji et al\.\(2022\)Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and Philip S\. Yu\. 2022\.A Survey on Knowledge Graphs: Representation, Acquisition, and Applications\.*IEEE Transactions on Neural Networks and Learning Systems*33, 2 \(2022\), 494–514\.[https://doi\.org/10\.1109/TNNLS\.2021\.3070843](https://doi.org/10.1109/TNNLS.2021.3070843) - Kok and Domingos \(2007\)Stanley Kok and Pedro Domingos\. 2007\.Statistical Predicate Invention\. In*Proceedings of the 24th International Conference on Machine Learning \(ICML\)*\. ACM, 433–440\.[https://doi\.org/10\.1145/1273496\.1273551](https://doi.org/10.1145/1273496.1273551) - Kou et al\.\(2026\)Zhiqiang Kou, Junyang Chen, Xin\-Qiang Cai, Xiaobo Xia, Ming\-Kun Xie, Dong\-Dong Wu, Biao Liu, Yuheng Jia, Xin Geng, Masashi Sugiyama, and Tat\-Seng Chua\. 2026\.Positive\-Unlabeled Reinforcement Learning Distillation for On\-Premise Small Models\.arXiv:2601\.20687 \[cs\.LG\][https://doi\.org/10\.48550/arXiv\.2601\.20687](https://doi.org/10.48550/arXiv.2601.20687) - Liu et al\.\(2025\)Shixuan Liu, Haoxiang Cheng, Yunfei Wang, Yue He, Changjun Fan, and Zhong Liu\. 2025\.EvoPath: Evolutionary Meta\-Path Discovery with Large Language Models for Complex Heterogeneous Information Networks\.*Information Processing & Management*62, 1 \(2025\), 103920\.[https://doi\.org/10\.1016/j\.ipm\.2024\.103920](https://doi.org/10.1016/j.ipm.2024.103920) - Long et al\.\(2024a\)Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min\-Yen Kan, and Nancy F\. Chen\. 2024a\.Multi\-Expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models\. In*Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing \(EMNLP\)*\. Association for Computational Linguistics, 20370–20401\.[https://doi\.org/10\.18653/v1/2024\.emnlp\-main\.1135](https://doi.org/10.18653/v1/2024.emnlp-main.1135) - Long et al\.\(2024b\)Xiao Long, Liansheng Zhuang, Aodi Li, Houqiang Li, and Shafei Wang\. 2024b\.Fact Embedding through Diffusion Model for Knowledge Graph Completion\. In*Proceedings of the ACM Web Conference \(WWW\)*\. ACM, 2020–2029\.[https://doi\.org/10\.1145/3589334\.3645451](https://doi.org/10.1145/3589334.3645451) - Luo et al\.\(2025\)Linhao Luo, Jiaxin Ju, Bo Xiong, Yuan\-Fang Li, Gholamreza Haffari, and Shirui Pan\. 2025\.ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning\. In*Proceedings of the Pacific\-Asia Conference on Knowledge Discovery and Data Mining \(PAKDD\)*\. Springer, 342–355\.[https://doi\.org/10\.1007/978\-981\-96\-8173\-0\_25](https://doi.org/10.1007/978-981-96-8173-0_25) - Meilicke et al\.\(2019\)Christian Meilicke, Melisachew Wudage Chekol, Daniel Ruffinelli, and Heiner Stuckenschmidt\. 2019\.Anytime Bottom\-Up Rule Learning for Knowledge Graph Completion\. In*Proceedings of the 28th International Joint Conference on Artificial Intelligence \(IJCAI\)*\. International Joint Conferences on Artificial Intelligence Organization, 3137–3143\.[https://doi\.org/10\.24963/IJCAI\.2019/435](https://doi.org/10.24963/IJCAI.2019/435) - Minervini et al\.\(2018\)Pasquale Minervini, Matko Bošnjak, Tim Rocktäschel, and Sebastian Riedel\. 2018\.Towards Neural Theorem Proving at Scale\. In*Proceedings of the ICML Workshop on Neural Abstract Machines and Program Induction \(NAMPI\)*\.[https://discovery\.ucl\.ac\.uk/id/eprint/10075182/](https://discovery.ucl.ac.uk/id/eprint/10075182/) - Minervini et al\.\(2020\)Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, and Tim Rocktäschel\. 2020\.Learning Reasoning Strategies in End\-to\-End Differentiable Proving\. In*Proceedings of the 37th International Conference on Machine Learning \(ICML\)*\. PMLR, 6938–6949\.[https://proceedings\.mlr\.press/v119/minervini20a\.html](https://proceedings.mlr.press/v119/minervini20a.html) - Pavlović and Sallinger \(2024\)Aleksandar Pavlović and Emanuel Sallinger\. 2024\.SpeedE: Euclidean Geometric Knowledge Graph Embedding Strikes Back\. In*Findings of the Association for Computational Linguistics: NAACL 2024 \(NAACL Findings\)*\. Association for Computational Linguistics, 89–98\.[https://doi\.org/10\.18653/v1/2024\.findings\-naacl\.6](https://doi.org/10.18653/v1/2024.findings-naacl.6) - Perez et al\.\(2018\)Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville\. 2018\.FiLM: Visual Reasoning with a General Conditioning Layer\. In*Proceedings of the AAAI Conference on Artificial Intelligence \(AAAI\)*, Vol\. 32\.[https://doi\.org/10\.1609/aaai\.v32i1\.11671](https://doi.org/10.1609/aaai.v32i1.11671) - Pu et al\.\(2024\)Tianle Pu, Chao Chen, Li Zeng, Shixuan Liu, Rui Sun, and Changjun Fan\. 2024\.Solving Combinatorial Optimization Problem over Graph through QUBO Transformation and Deep Reinforcement Learning\. In*Proceedings of the IEEE International Conference on Data Mining \(ICDM\)*\. IEEE, 390–399\.[https://doi\.org/10\.1109/ICDM59182\.2024\.00046](https://doi.org/10.1109/ICDM59182.2024.00046) - Qian et al\.\(2024\)Hao Qian, Wenjing Huang, Shikui Tu, and Lei Xu\. 2024\.KGDiff: Towards Explainable Target\-Aware Molecule Generation with Knowledge Guidance\.*Briefings in Bioinformatics*25, 1 \(2024\), bbad435\.[https://doi\.org/10\.1093/bib/bbad435](https://doi.org/10.1093/bib/bbad435) - Qu et al\.\(2021\)Meng Qu, Junkun Chen, Louis\-Pascal Xhonneux, Yoshua Bengio, and Jian Tang\. 2021\.RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs\. In*Proceedings of the International Conference on Learning Representations \(ICLR\)*\.[https://openreview\.net/forum?id=tGZu6DlbreV](https://openreview.net/forum?id=tGZu6DlbreV) - Quinlan \(1990\)J\. Ross Quinlan\. 1990\.Learning Logical Definitions from Relations\.*Machine Learning*5 \(1990\), 239–266\.[https://doi\.org/10\.1007/BF00117105](https://doi.org/10.1007/BF00117105) - Rocktäschel and Riedel \(2017\)Tim Rocktäschel and Sebastian Riedel\. 2017\.End\-to\-End Differentiable Proving\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 30\.[https://proceedings\.neurips\.cc/paper/2017/hash/b2ab001909a8a6f04b51920306046ce5\-Abstract\.html](https://proceedings.neurips.cc/paper/2017/hash/b2ab001909a8a6f04b51920306046ce5-Abstract.html) - Sadeghian et al\.\(2019\)Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, and Daisy Zhe Wang\. 2019\.DRUM: End\-to\-End Differentiable Rule Mining on Knowledge Graphs\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 32\.[https://proceedings\.neurips\.cc/paper/2019/hash/0c72cb7ee1512f800abe27823a792d03\-Abstract\.html](https://proceedings.neurips.cc/paper/2019/hash/0c72cb7ee1512f800abe27823a792d03-Abstract.html) - Saxena et al\.\(2022\)Apoorv Saxena, Adrian Kochsiek, and Rainer Gemulla\. 2022\.Sequence\-to\-Sequence Knowledge Graph Completion and Question Answering\. In*Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(ACL\)*\. Association for Computational Linguistics, 2814–2828\.[https://doi\.org/10\.18653/v1/2022\.acl\-long\.201](https://doi.org/10.18653/v1/2022.acl-long.201) - Suchanek et al\.\(2007\)Fabian M\. Suchanek, Gjergji Kasneci, and Gerhard Weikum\. 2007\.YAGO: A Core of Semantic Knowledge\. In*Proceedings of the 16th International Conference on World Wide Web \(WWW\)*\. ACM, 697–706\.[https://doi\.org/10\.1145/1242572\.1242667](https://doi.org/10.1145/1242572.1242667) - Sun et al\.\(2019\)Zhiqing Sun, Zhi\-Hong Deng, Jian\-Yun Nie, and Jian Tang\. 2019\.RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\. In*Proceedings of the International Conference on Learning Representations \(ICLR\)*\.[https://openreview\.net/forum?id=HkgEQnRqYQ](https://openreview.net/forum?id=HkgEQnRqYQ) - Toutanova and Chen \(2015\)Kristina Toutanova and Danqi Chen\. 2015\.Observed versus Latent Features for Knowledge Base and Text Inference\. In*Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality \(CVSC\)*\. Association for Computational Linguistics, 57–66\.[https://doi\.org/10\.18653/v1/W15\-4007](https://doi.org/10.18653/v1/W15-4007) - Trouillon et al\.\(2016\)Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard\. 2016\.Complex Embeddings for Simple Link Prediction\. In*Proceedings of the 33rd International Conference on Machine Learning \(ICML\)*\. PMLR, 2071–2080\.[https://proceedings\.mlr\.press/v48/trouillon16\.html](https://proceedings.mlr.press/v48/trouillon16.html) - Tseng et al\.\(2023\)Alex M\. Tseng, Nathaniel Diamant, Tommaso Biancalani, and Gabriele Scalia\. 2023\.GraphGUIDE: Interpretable and Controllable Conditional Graph Generation with Discrete Bernoulli Diffusion\. In*Proceedings of the International Conference on Learning Representations Workshop on Machine Learning for Drug Discovery \(ICLR MLDD Workshop\)*\.[https://openreview\.net/forum?id=w4Zb1ov9qNi](https://openreview.net/forum?id=w4Zb1ov9qNi) - Vashishth et al\.\(2020\)Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha Talukdar\. 2020\.Composition\-Based Multi\-Relational Graph Convolutional Networks\. In*Proceedings of the International Conference on Learning Representations \(ICLR\)*\.[https://openreview\.net/forum?id=BylA\_C4tPr](https://openreview.net/forum?id=BylA_C4tPr) - Vignac et al\.\(2023\)Clément Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, and Pascal Frossard\. 2023\.DiGress: Discrete Denoising Diffusion for Graph Generation\. In*Proceedings of the International Conference on Learning Representations \(ICLR\)*\.[https://openreview\.net/forum?id=UaAD\-Nu86WX](https://openreview.net/forum?id=UaAD-Nu86WX) - Yang et al\.\(2015\)Bishan Yang, Wen\-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng\. 2015\.Embedding Entities and Relations for Learning and Inference in Knowledge Bases\. In*Proceedings of the International Conference on Learning Representations \(ICLR\)*\.[https://arxiv\.org/abs/1412\.6575](https://arxiv.org/abs/1412.6575) - Yang et al\.\(2017a\)Fan Yang, Zhilin Yang, and William W\. Cohen\. 2017a\.Differentiable Learning of Logical Rules for Knowledge Base Reasoning\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 30\. 2316–2325\.[https://proceedings\.neurips\.cc/paper/2017/hash/0e55666a4ad822e0e34299df3591d979\-Abstract\.html](https://proceedings.neurips.cc/paper/2017/hash/0e55666a4ad822e0e34299df3591d979-Abstract.html) - Yang et al\.\(2017b\)Fan Yang, Zhilin Yang, and William W\. Cohen\. 2017b\.Differentiable Learning of Logical Rules for Knowledge Base Reasoning\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 30\. 2316–2325\.[https://proceedings\.neurips\.cc/paper/2017/hash/0e55666a4ad822e0e34299df3591d979\-Abstract\.html](https://proceedings.neurips.cc/paper/2017/hash/0e55666a4ad822e0e34299df3591d979-Abstract.html) - Yang et al\.\(2023b\)Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming\-Hsuan Yang\. 2023b\.Diffusion Models: A Comprehensive Survey of Methods and Applications\.*Comput\. Surveys*56, 4 \(2023\), 1–39\.[https://doi\.org/10\.1145/3626235](https://doi.org/10.1145/3626235) - Yang et al\.\(2023a\)Run Yang, Yuling Yang, Fan Zhou, and Qiang Sun\. 2023a\.Directional Diffusion Models for Graph Representation Learning\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 36\. 32720–32731\.[https://proceedings\.neurips\.cc/paper\_files/paper/2023/hash/688a30d8d1a4cd1a5af304e235651f1c\-Abstract\-Conference\.html](https://proceedings.neurips.cc/paper_files/paper/2023/hash/688a30d8d1a4cd1a5af304e235651f1c-Abstract-Conference.html) - Zhou et al\.\(2024\)Cai Zhou, Xiyuan Wang, and Muhan Zhang\. 2024\.Unifying Generation and Prediction on Graphs with Latent Graph Diffusion\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 37\. 61963–61999\.[https://proceedings\.neurips\.cc/paper\_files/paper/2024/hash/724c2ca4988d5ab3229cdaf39f93393f\-Abstract\-Conference\.html](https://proceedings.neurips.cc/paper_files/paper/2024/hash/724c2ca4988d5ab3229cdaf39f93393f-Abstract-Conference.html) - Zhu et al\.\(2023\)Zhaocheng Zhu, Xinyu Yuan, Michael Galkin, Louis\-Pascal Xhonneux, Ming Zhang, Maxime Gazeau, and Jian Tang\. 2023\.A\*Net: A Scalable Path\-Based Reasoning Approach for Knowledge Graphs\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 36\. 59323–59336\.[https://proceedings\.neurips\.cc/paper\_files/paper/2023/hash/bc72aff709ceb78e51335140f80d7f1a\-Abstract\-Conference\.html](https://proceedings.neurips.cc/paper_files/paper/2023/hash/bc72aff709ceb78e51335140f80d7f1a-Abstract-Conference.html) - Zhu et al\.\(2021\)Zhaocheng Zhu, Zuobai Zhang, Louis\-Pascal Xhonneux, and Jian Tang\. 2021\.Neural Bellman\-Ford Networks: A General Graph Neural Network Framework for Link Prediction\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*, Vol\. 34\. 29476–29490\.[https://proceedings\.neurips\.cc/paper/2021/hash/f6a673f09493afcd8b129a0bcf1cd5bc\-Abstract\.html](https://proceedings.neurips.cc/paper/2021/hash/f6a673f09493afcd8b129a0bcf1cd5bc-Abstract.html) ## Appendix ADataset Description - •Family\(Hinton,[1986](https://arxiv.org/html/2605.30747#bib.bib13)\)captures relations among family members\. - •Kinship\(Kok and Domingos,[2007](https://arxiv.org/html/2605.30747#bib.bib17)\)contains kinship relations among members of the Alyawarra tribe from Central Australia\. - •UMLS\(Kok and Domingos,[2007](https://arxiv.org/html/2605.30747#bib.bib17)\)contains biomedical concepts as entities and treatment and diagnosis relations\. - •WN\-18RR\(Dettmers et al\.,[2018](https://arxiv.org/html/2605.30747#bib.bib10)\)is a lexical dataset whose entities refer to word senses and whose relations define lexical connections\. - •FB15K\-237\(Toutanova and Chen,[2015](https://arxiv.org/html/2605.30747#bib.bib37)\)is derived from Freebase, an online collection of structured data extracted from multiple sources\. - •YAGO3\-10\(Suchanek et al\.,[2007](https://arxiv.org/html/2605.30747#bib.bib35)\)is a subset of YAGO, a large semantic knowledge base constructed from multiple sources\. Table 4\.Statistics of KGC datasetsTable 5\.Representative chain\-like and graph\-like rules learned by GRiD on YAGO3\-10\. ## Appendix BBaseline Description Embedding Baselines\. - •TransE\(Bordes et al\.,[2013](https://arxiv.org/html/2605.30747#bib.bib5)\)\. TransE models relations as translations between entity embeddings\. - •DistMult\(Yang et al\.,[2015](https://arxiv.org/html/2605.30747#bib.bib42)\)\. DistMult uses diagonal bilinear scoring for entity\-relation interactions\. - •ComplEx\(Trouillon et al\.,[2016](https://arxiv.org/html/2605.30747#bib.bib38)\)\. ComplEx uses complex embeddings to capture symmetric and antisymmetric relations\. - •RotatE\(Sun et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib36)\)\. RotatE models relations as rotations in complex space\. - •SpeedE\(Pavlović and Sallinger,[2024](https://arxiv.org/html/2605.30747#bib.bib26)\)\. SpeedE is a lightweight Euclidean geometric KGE model for efficient KGC\. GNN\-based Baselines\. - •CompGCN\(Vashishth et al\.,[2020](https://arxiv.org/html/2605.30747#bib.bib40)\)\. CompGCN jointly embeds nodes and relations via compositional GCN aggregation\. - •NBFNet\(Zhu et al\.,[2021](https://arxiv.org/html/2605.30747#bib.bib49)\)\. NBFNet learns Bellman\-Ford\-style path representations for link prediction\. - •A∗Net\(Zhu et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib48)\)\. A∗Net extends NBFNet with A∗\-like heuristic search to guide the reasoning process\. Rule\-based Baselines\. - •AnyBURL\(Meilicke et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib23)\)\. AnyBURL mines Horn rules via anytime bottom\-up path sampling\. - •AMIE\(Galárraga et al\.,[2013](https://arxiv.org/html/2605.30747#bib.bib12)\)\. AMIE mines Horn rules in large KBs under PCA for counterexamples\. - •Neural\-LP\(Yang et al\.,[2017b](https://arxiv.org/html/2605.30747#bib.bib44)\)\. Neural\-LP learns logical rules through differentiable tensor operations\. - •DRUM\(Sadeghian et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib33)\)\. DRUM extends Neural\-LP with RNNs to model longer rule chains\. - •RNNLogic\(Qu et al\.,[2021](https://arxiv.org/html/2605.30747#bib.bib30)\)\. RNNLogic uses an RNN\-based generator to propose logical rules for KG reasoning\. - •RLogic\(Cheng et al\.,[2022](https://arxiv.org/html/2605.30747#bib.bib9)\)\. RLogic introduces recursive logical rule learning through predicate representation learning\. - •NCRL\(Cheng et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib8)\)\. NCRL is a neural compositional rule learning method that uses attention mechanisms for recursive reasoning\. - •ChatRule\(Luo et al\.,[2025](https://arxiv.org/html/2605.30747#bib.bib22)\)\. ChatRule uses prompts to guide LLMs in generating logical rules with KG structural validation\. ## Appendix CRule Comparsion Representative chain\-like and graph\-like rules generated by GRiD on YAGO3\-10 are presented in Table[5](https://arxiv.org/html/2605.30747#A1.T5)\. Here, we analyze graph\-like rules according to the four rule\-body types in Fig\.[2](https://arxiv.org/html/2605.30747#S3.F2)\. Linear Rule Body \(Fig[2\(a\)](https://arxiv.org/html/2605.30747#S3.F2.sf1)\)\.This type represents a sequential variable chain\. The ruleisCitizenOf\(x,y\)←\(x,y\)\\leftarrowhasAcademicAdvisor\(x,z1\)∧\(x,z\_\{1\}\)\\wedgeworksAt\(z1,z2\)∧\(z\_\{1\},z\_\{2\}\)\\wedgeisLocatedIn\(z2,y\)\(z\_\{2\},y\)demonstrates this structure, where inference follows a single path fromxxtoyythrough the workplace of the advisor and its location\. This rule captures citizenship through the institutional affiliation of the advisor\. Branching Rule Body \(Fig[2\(b\)](https://arxiv.org/html/2605.30747#S3.F2.sf2)\)\.This type contains a branching node connected to multiple nodes\. The ruleisCitizenOf\(x,y\)←\(x,y\)\\leftarrowisMarriedTo\(x,z1\)∧\(x,z\_\{1\}\)\\wedgeisCitizenOf\(z1,z2\)∧\(z\_\{1\},z\_\{2\}\)\\wedgeDealswith\(z1,y\)\(z\_\{1\},y\)branches at nodez1z\_\{1\}, which connects to bothz2z\_\{2\}andyy\. This rule captures citizenship through familial and relational dependencies\. Node\-Cyclic Rule Body \(Fig[2\(c\)](https://arxiv.org/html/2605.30747#S3.F2.sf3)\)\.This type contains shared intermediate nodes that form cycles\. The ruleisLocatedIn\(x,y\)←\(x,y\)\\leftarrowhasCapital\(x,z1\)∧\(x,z\_\{1\}\)\\wedgehasCapital\(y,z1\)∧\(y,z\_\{1\}\)\\wedgehasOfficialLanguage\(x,z2\)∧\(x,z\_\{2\}\)\\wedgehasOfficialLanguage\(y,z2\)\(y,z\_\{2\}\)connectsxxandyythrough the shared capitalz1z\_\{1\}and languagez2z\_\{2\}, reflecting how shared attributes can indicate hierarchical relationships\. Edge\-Cyclic Rule Body \(Fig[2\(d\)](https://arxiv.org/html/2605.30747#S3.F2.sf4)\)\.This type contains multiple edges between the same entity pair\. The ruleisCitizenOf\(x,y\)←\(x,y\)\\leftarrowLivesIn\(x,y\)∧\(x,y\)\\wedgeWorksAt\(x,y\)\(x,y\)forms an edge cycle in which two relations connectxxandyy, showing how multiple co\-location signals support citizenship inference\. ## Appendix DKGC Pipeline This section summarizes how the generated rules are applied to the KGC task, with the full pipeline illustrated in Algorithm[1](https://arxiv.org/html/2605.30747#alg1)\. For each relationr∈ℛr\\in\\mathcal\{R\}, we construct a CSR sparse adjacency matrix𝐌r∈\{0,1\}\|ℰ\|×\|ℰ\|\\mathbf\{M\}\_\{r\}\\in\\\{0,1\\\}^\{\|\\mathcal\{E\}\|\\times\|\\mathcal\{E\}\|\}, where𝐌r\[i,j\]=1\\mathbf\{M\}\_\{r\}\[i,j\]=1if\(ei,r,ej\)∈ℱ\(e\_\{i\},r,e\_\{j\}\)\\in\\mathcal\{F\}\. Let𝐌rgt\\mathbf\{M\}^\{\\mathrm\{gt\}\}\_\{r\}denote the ground\-truth matrix used for rule\-quality computation and filtered ranking\. For each head relationrr, the generated rule setΩr\\Omega\_\{r\}is partitioned into chain\-like rulesΩrchain\\Omega\_\{r\}^\{\\mathrm\{chain\}\}and graph\-like rulesΩrgraph\\Omega\_\{r\}^\{\\mathrm\{graph\}\}\. Algorithm 1KGC with Scored Logical Rules0:KG matrices \{𝐌r\}r∈ℛ\\\{\\mathbf\{M\}\_\{r\}\\\}\_\{r\\in\\mathcal\{R\}\}, ground\-truth matrices \{𝐌rgt\}r∈ℛ\\\{\\mathbf\{M\}^\{\\mathrm\{gt\}\}\_\{r\}\\\}\_\{r\\in\\mathcal\{R\}\}, rule sets \{Ωrchain,Ωrgraph\}r∈ℛ\\\{\\Omega\_\{r\}^\{\\mathrm\{chain\}\},\\Omega\_\{r\}^\{\\mathrm\{graph\}\}\\\}\_\{r\\in\\mathcal\{R\}\}, fusion weight α\\alpha, test queries QQ 0:Filtered ranks \{rankq\}q∈Q\\\{\\mathrm\{rank\}\_\{q\}\\\}\_\{q\\in Q\} 1:for r∈ℛr\\in\\mathcal\{R\}do 2: 𝐒rchain←𝟎\\mathbf\{S\}^\{\\mathrm\{chain\}\}\_\{r\}\\leftarrow\\mathbf\{0\}, 𝐒rgraph←𝟎\\mathbf\{S\}^\{\\mathrm\{graph\}\}\_\{r\}\\leftarrow\\mathbf\{0\} 3:for ρ∈Ωrchain∪Ωrgraph\\rho\\in\\Omega\_\{r\}^\{\\mathrm\{chain\}\}\\cup\\Omega\_\{r\}^\{\\mathrm\{graph\}\}do 4:if ρ∈Ωrchain\\rho\\in\\Omega\_\{r\}^\{\\mathrm\{chain\}\}then 5: 𝐌ρ,body←𝐌r1𝐌r2⋯𝐌rL\\mathbf\{M\}\_\{\\rho,\\mathrm\{body\}\}\\leftarrow\\mathbf\{M\}\_\{r\_\{1\}\}\\mathbf\{M\}\_\{r\_\{2\}\}\\cdots\\mathbf\{M\}\_\{r\_\{L\}\} 6:else 7:Decompose ρb\\rho\_\{b\}into chain components \{C1,…,CN\}\\\{C\_\{1\},\\ldots,C\_\{N\}\\\}, main chain CMC\_\{M\} 8: 𝐌ρ,body←𝐌CM⊙⨀Ci≠CM𝐌Ci\\mathbf\{M\}\_\{\\rho,\\mathrm\{body\}\}\\leftarrow\\mathbf\{M\}\_\{C\_\{M\}\}\\odot\\bigodot\_\{C\_\{i\}\\neq C\_\{M\}\}\\mathbf\{M\}\_\{C\_\{i\}\} 9:endif 10: Suppρ𝒢←nnz\(𝐌ρ,body⊙𝐌rgt\)Supp\_\{\\rho\}^\{\\mathcal\{G\}\}\\leftarrow\\mathrm\{nnz\}\\\!\\left\(\\mathbf\{M\}\_\{\\rho,\\mathrm\{body\}\}\\odot\\mathbf\{M\}^\{\\mathrm\{gt\}\}\_\{r\}\\right\) 11: s\(ρ\)←Score\(Suppρ𝒢,𝐌ρ,body,𝐌rgt\)s\(\\rho\)\\leftarrow\\mathrm\{Score\}\\\!\\left\(Supp\_\{\\rho\}^\{\\mathcal\{G\}\},\\mathbf\{M\}\_\{\\rho,\\mathrm\{body\}\},\\mathbf\{M\}^\{\\mathrm\{gt\}\}\_\{r\}\\right\) 12:if ρ∈Ωrchain\\rho\\in\\Omega\_\{r\}^\{\\mathrm\{chain\}\}then 13: 𝐒rchain←𝐒rchain\+s\(ρ\)𝐌ρ,body\\mathbf\{S\}^\{\\mathrm\{chain\}\}\_\{r\}\\leftarrow\\mathbf\{S\}^\{\\mathrm\{chain\}\}\_\{r\}\+s\(\\rho\)\\mathbf\{M\}\_\{\\rho,\\mathrm\{body\}\} 14:else 15: 𝐒rgraph←𝐒rgraph\+s\(ρ\)𝐌ρ,body\\mathbf\{S\}^\{\\mathrm\{graph\}\}\_\{r\}\\leftarrow\\mathbf\{S\}^\{\\mathrm\{graph\}\}\_\{r\}\+s\(\\rho\)\\mathbf\{M\}\_\{\\rho,\\mathrm\{body\}\} 16:endif 17:endfor 18: 𝐒r←\(1−α\)𝐒rchain\+α𝐒rgraph\\mathbf\{S\}\_\{r\}\\leftarrow\(1\-\\alpha\)\\mathbf\{S\}^\{\\mathrm\{chain\}\}\_\{r\}\+\\alpha\\mathbf\{S\}^\{\\mathrm\{graph\}\}\_\{r\} 19:endfor 20:for q=\(eh,r,et⋆\)∈Qq=\(e\_\{h\},r,e\_\{t\}^\{\\star\}\)\\in Qdo 21: 𝐬q←𝐒r\[eh,:\]\\mathbf\{s\}\_\{q\}\\leftarrow\\mathbf\{S\}\_\{r\}\[e\_\{h\},:\] 22: 𝐬q\[e\]←−∞\\mathbf\{s\}\_\{q\}\[e\]\\leftarrow\-\\inftyfor all e≠et⋆e\\neq e\_\{t\}^\{\\star\}with 𝐌rgt\[eh,e\]=1\\mathbf\{M\}^\{\\mathrm\{gt\}\}\_\{r\}\[e\_\{h\},e\]=1 23: rankq←1\+∑e∈ℰ𝕀\(𝐬q\[e\]\>𝐬q\[et⋆\]\)\\mathrm\{rank\}\_\{q\}\\leftarrow 1\+\\sum\_\{e\\in\\mathcal\{E\}\}\\mathbb\{I\}\\\!\\left\(\\mathbf\{s\}\_\{q\}\[e\]\>\\mathbf\{s\}\_\{q\}\[e\_\{t\}^\{\\star\}\]\\right\) 24:endfor 25:return \{rankq\}q∈Q\\\{\\mathrm\{rank\}\_\{q\}\\\}\_\{q\\in Q\} For a chain\-like rule, the body reachability matrix is obtained by successive sparse matrix multiplication\. For a graph\-like rule, we decompose the rule body into chain components, ground each component independently, and apply element\-wise conjunction to retain entity pairs that satisfy all structural constraints\. The functionScore\(⋅\)\\mathrm\{Score\}\(\\cdot\)computes the scalar rule qualitys\(ρ\)s\(\\rho\)from coverage, confidence, and PCA\-confidence, as defined in Eqs\.[3](https://arxiv.org/html/2605.30747#S3.E3)\-[5](https://arxiv.org/html/2605.30747#S3.E5)\. After all rules for relationrrare grounded and scored, their evidence is aggregated into𝐒rchain\\mathbf\{S\}^\{\\mathrm\{chain\}\}\_\{r\}and𝐒rgraph\\mathbf\{S\}^\{\\mathrm\{graph\}\}\_\{r\}, which are then combined using the fusion weightα\\alpha\. Given a query\(eh,r,?\)\(e\_\{h\},r,?\), the row vector𝐒r\[eh,:\]\\mathbf\{S\}\_\{r\}\[e\_\{h\},:\]provides scores for candidate tail entities\. We follow the filtered ranking protocol by masking all known valid tail entities except the target entityet⋆e\_\{t\}^\{\\star\}\. The final performance is reported using Hit@K and MRR: \(18\)Hit@K=1\|Q\|∑q∈Q𝕀\(rankq≤K\),MRR=1\|Q\|∑q∈Q1rankq\.\\mathrm\{Hit@K\}=\\frac\{1\}\{\|Q\|\}\\sum\_\{q\\in Q\}\\mathbb\{I\}\(\\mathrm\{rank\}\_\{q\}\\leq K\),\\quad\\mathrm\{MRR\}=\\frac\{1\}\{\|Q\|\}\\sum\_\{q\\in Q\}\\frac\{1\}\{\\mathrm\{rank\}\_\{q\}\}\. Table 6\.Mean computational cost over 5 independent runs\. Time is reported in seconds \(s\) and peak memory in GB\. ## Appendix EComplexity and Cost Analysis We analyze the computational complexity of the three phases of GRiD: SL pre\-training, RL fine\-tuning, and inference\. Wall\-clock time and peak memory usage across six benchmark datasets are reported in Table[6](https://arxiv.org/html/2605.30747#A4.T6)\. SL Pre\-training\.The cost of is dominated by the Graph Transformer, with a per\-epoch complexity ofO\(\|𝒟pre\|⋅S2\)O\(\|\\mathcal\{D\}\_\{pre\}\|\\cdot S^\{2\}\)\. The single\-step denoising objective avoids iterative sampling, and the quadratic term introduces limited overhead because rule bodies are compact \(S≤6S\\leq 6\)\. This bound follows the common practice of restricting chain\-like rules to 4 nodes\(Meilicke et al\.,[2019](https://arxiv.org/html/2605.30747#bib.bib23); Cheng et al\.,[2023](https://arxiv.org/html/2605.30747#bib.bib8)\)and adding 2 auxiliary nodes, since complex rules can generally be decomposed into smaller substructures\. As reported in Table[6](https://arxiv.org/html/2605.30747#A4.T6), the per\-epoch time is 8\.61 s on FB15K\-237 and 1\.34 s on the larger but sparser YAGO3\-10\. RL Fine\-tuning\.This phase requires full generation to compute rewards, with a complexity ofO\(NRL⋅T⋅S2\)O\(N\_\{RL\}\\cdot T\\cdot S^\{2\}\)forNRLN\_\{RL\}sampled rules andTTdiffusion steps\. Generation is linear inTT, and the small rule sizeSSkeeps the computational graph compact, thereby limiting GPU memory consumption\. As reported in Table[6](https://arxiv.org/html/2605.30747#A4.T6), RL fine\-tuning is the most time\-consuming phase, requiring 18\.76 s per epoch on YAGO3\-10, while peak memory reaches 11\.41 GB on FB15K\-237 and 5\.52 GB on YAGO3\-10\. These figures indicate that GRiD can be trained on consumer\-grade GPUs\. Inference\.Inference is performed offline and is decoupled from diffusion\-model training\. KGC inference uses symbolic rule application with sparse matrix operations\. Applying a rule of lengthLLrequires sparse matrix multiplications, with practical complexity proportional to the number of nonzero entries in the involved relation matrices\. Thus, rule application scales with sparse matrix operations and is independent of generation latency\. Table[6](https://arxiv.org/html/2605.30747#A4.T6)shows that the full training and inference pipeline is feasible across all datasets, ranging from under two minutes on WN\-18RR to approximately two hours on YAGO3\-10\. ## Appendix FSensitivity Analysis of Fusion Weight This section evaluates the fusion parameterα\\alpha, which controls the relative contribution of chain\-like and graph\-like rules in the integrated scoring function\. Given fixed chain\-like and graph\-like rules,α\\alphais varied at the scoring stage as follows: 𝐒=\(1−α\)⋅𝐒rchain\+α⋅𝐒rgraph\\mathbf\{S\}=\(1\-\\alpha\)\\cdot\\mathbf\{S\}\_\{r\}^\{\\text\{chain\}\}\+\\alpha\\cdot\\mathbf\{S\}\_\{r\}^\{\\text\{graph\}\}As shown in Fig\.[6](https://arxiv.org/html/2605.30747#A8.F6),α=0\\alpha=0denotes chain\-only reasoning, whereasα=1\\alpha=1denotes graph\-only reasoning\. Results across all datasets show that combining both rule types yields competitive or better KGC performance than using either rule type alone, indicating that they capture complementary relational patterns\. Chain\-only reasoning remains stable, suggesting that chain\-like rules capture the dominant logical regularities in existing KGs\. Adding a small proportion of graph\-like rules consistently improves the chain\-only baseline\. The best performance across metrics is typically obtained withα∈\[0\.1,0\.2\]\\alpha\\in\[0\.1,0\.2\], indicating that graph\-like rules provide complementary information beyond chain\-like rules\. ## Appendix GMulti\-seed Robustness Analysis To assess the sensitivity of GRiD to random seeds, we evaluate KGC performance across five distinct seeds\. Table[7](https://arxiv.org/html/2605.30747#A7.T7)reports per\-seed results for all datasets, and Table[1](https://arxiv.org/html/2605.30747#S5.T1)presents the mean scores used in the main experiments\. Overall, GRiD shows stable performance across seeds\. On small\- and medium\-scale datasets, such as Family and Kinship, the variance in MRR and Hits@kkis close to zero\. On more complex datasets, such as FB15K\-237 and YAGO3\-10, the variance remains low, with values below 1\.5 across all metrics, indicating limited performance variation due to random initialization\. These results indicate that GRiD is stable under different random seed initializations\. Table 7\.Performance evaluation results under different random seeds across various datasets\. ## Appendix HImpact of KG Structural Properties To examine when graph\-like rules provide advantages over chain\-like rules, we analyze the dataset\-level structural indicators in Table[8](https://arxiv.org/html/2605.30747#A8.T8), including schema\-level connectivity \(BF\_Schema\), relation frequency imbalance \(Gini\), relation co\-occurrence tendency \(NPIM\), and the many\-to\-many \(N\-to\-N\) ratio\. For each relation, we compare KGC performance using only chain\-like rules and only graph\-like rules, and report Graph Better as the proportion of relations for which graph\-like rules outperform chain\-like rules\. Table 8\.Statistics and properties of the experimental datasets\.In Family and Kinship, graph\-like rules do not outperform chain\-like rules for any relation, despite the high N\-to\-N ratios of these datasets\. This suggests that a many\-to\-many structure alone is insufficient to explain the usefulness of graph\-like rules\. The regular kinship semantics of these datasets allow short chain\-like rules to capture effective reasoning patterns\. WN\-18RR shows a similar pattern: although graph\-like rules improve some relations, its negative NPIM value and lexical semantics indicate limited relation co\-occurrence, which reduces the applicability of graph constraints\. In contrast, UMLS, FB15K\-237, and YAGO3\-10 have Graph Better values of 17\.4%, 20\.7%, and 27\.0%, respectively\. Their high Gini coefficients indicate skewed relation distributions\. In such KGs, chain\-like rules may match multiple candidate patterns, whereas graph\-like rules add conjunctive constraints to reduce ambiguity\. The higher Graph Better value of YAGO3\-10, compared with FB15K\-237, is consistent with the higher Gini coefficient of YAGO3\-10\. Overall, graph\-like rules are most beneficial when additional structural constraints can reduce the ambiguity of chain\-based reasoning, whereas their advantage is limited when relation semantics are already precise and regular\. \(a\)Family Performance \(b\)WN\-18RR Performance \(c\)FB15K\-237 Performance \(d\)YAGO3\-10 Performance Figure 6\.Ablation study on the graph–chain fusion weightα\\alpha ## Appendix IRule Dataset Construction This section describes the construction of supervised pre\-training data, which provides the diffusion model with valid, semantically grounded graph\-like rules derived from KGs rather than random structures\. We construct training rules in two stages\. First, we sample chain\-like rules via random walks on KGs as structural primitives\. We then augment these chains into graph\-like rules by injecting auxiliary nodes and relations according to the four graph\-like rule types in Fig\.[2](https://arxiv.org/html/2605.30747#S3.F2)\. Chain\-like Rule Sampling\.Given a KG𝒢\\mathcal\{G\}, we extract triples\(eh,rh,et\)\(e\_\{h\},r\_\{h\},e\_\{t\}\), whererhr\_\{h\}is the target relation\. For each triple, we perform a random walk of fixed lengthllfrom the head entityehe\_\{h\}\. At each step, the next entity is selected uniformly from the neighbors of the current entity: P\(ei∣ei−1\)=1\|N\(ei−1\)\|P\(e\_\{i\}\\mid e\_\{i\-1\}\)=\\frac\{1\}\{\|N\(e\_\{i\-1\}\)\|\}The resulting relation sequence along the path fromehe\_\{h\}toete\_\{t\}forms a relation chain, which is converted into a chain\-like rule body by replacing entities with logical variables\. Graph\-like Rule Augmentation\.After obtaining chain\-like rules, we transform them into graph\-like rules\. For each rule, we uniformly sample one predefined graph topology from Fig\.[2](https://arxiv.org/html/2605.30747#S3.F2)and inject auxiliary nodes and relations accordingly\. Depending on the topology, one or two auxiliary variables are introduced to form non\-chain structures, such as branching or cyclic dependencies\. Auxiliary relations are sampled fromℛ\\mathcal\{R\}to preserve relation\-level validity\. Each chain is augmented once to produce one graph\-like rule, yielding a controlled training set\. Training Instance Definition\.Each constructed rule yields one supervised training instance consisting of a rule body and a target relation\. The rule body is represented as an adjacency matrix𝐀0\\mathbf\{A\}\_\{0\}, as defined in Section[4\.1\.1](https://arxiv.org/html/2605.30747#S4.SS1.SSS1), and the target relationrhr\_\{h\}serves as the generation condition\. These instances form the dataset for supervised pre\-training\.
Similar Articles
Stepwise Reasoning Enhancement for LLMs via External Subgraph Generation
This paper proposes SGR, a framework that enhances LLM stepwise reasoning by integrating external knowledge graphs through query-relevant subgraph generation, combining Cypher-based reasoning with collaborative reasoning integration. Experiments on CWQ, WebQSP, GrailQA, and KQA Pro show improved reasoning accuracy over standard prompting and knowledge-enhanced baselines.
GraphReAct: Reasoning and Acting for Multi-step Graph Inference
This paper introduces GraphReAct, a framework that extends reasoning-acting paradigms to graph-structured data for multi-step inference. It combines topological and semantic retrieval with context refinement to improve performance on graph learning benchmarks.
Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion
The paper introduces a novel task of fact generation for hyper-relational knowledge graphs (HKGs) and proposes KREPE, a generative representation learning method using masked discrete diffusion that unifies link prediction and fact generation, achieving state-of-the-art performance.
SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs
This paper introduces SPARK, a self-play reinforcement learning framework that leverages knowledge graphs derived from scientific literature to improve relational reasoning in vision-language models.
SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation
Introduces SGR, a stepwise reasoning framework that enhances LLM reasoning by generating query-specific subgraphs from external knowledge bases, improving accuracy and factual reliability.