Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion

arXiv cs.LG 06/18/26, 04:00 AM Papers
Summary
This paper proposes a K-Hop Gaussian (KHG) diffusion kernel as a preprocessing module for graph neural networks, balancing local and global information propagation to mitigate over-smoothing and information bottlenecks. Experiments show significant improvements over traditional message-passing GNNs and existing diffusion kernels, especially on noisy or structurally complex graphs.
arXiv:2606.18317v1 Announce Type: new Abstract: Most graph neural network (GNN) cores rely on graph convolutions, typically implemented as message passing between direct (single-hop) neighbors. In many real-world graphs, edges can be noisy or poorly defined, limiting information propagation to local neighborhoods. Existing diffusion kernels, such as Personalized PageRank (PPR) and Heat Kernel, alleviate this issue through global propagation, but still struggle with complex local structures and distant node noise. To address these limitations, we propose a K-Hop Gaussian (KHG) diffusion kernel as a preprocessing module for graph data. KHG introduces multi-hop diffusion with Gaussian weighting for remote nodes, balancing local and global information propagation before applying standard GNNs. Experiments on multiple benchmark datasets demonstrate that KHG significantly outperforms traditional message-passing GNNs, as well as PPR and Heat Kernel diffusion, particularly in noisy or structurally complex graphs.
Original Article
View Cached Full Text
Cached at: 06/18/26, 05:41 AM
# Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion
Source: [https://arxiv.org/html/2606.18317](https://arxiv.org/html/2606.18317)
###### Abstract

Most graph neural network \(GNN\) cores rely on graph convolutions, typically implemented as message passing between direct \(single\-hop\) neighbors\. In many real\-world graphs, edges can be noisy or poorly defined, limiting information propagation to local neighborhoods\. Existing diffusion kernels, such as Personalized PageRank \(PPR\) and Heat Kernel, alleviate this issue through global propagation, but still struggle with complex local structures and distant node noise\. To address these limitations, we propose a K\-Hop Gaussian \(KHG\) diffusion kernel as a preprocessing module for graph data\. KHG introduces multi\-hop diffusion with Gaussian weighting for remote nodes, balancing local and global information propagation before applying standard GNNs\. Experiments on multiple benchmark datasets demonstrate that KHG significantly outperforms traditional message\-passing GNNs, as well as PPR and Heat Kernel diffusion, particularly in noisy or structurally complex graphs\.

Index Terms—Graph Neural Network, Node Classification, Graph Classification

## 1Introduction

Graph neural networks \(GNNs\) have become an important tool for analyzing graph\-structured data, with applications in social networks, transportation networks, molecular graphs\[[9](https://arxiv.org/html/2606.18317#bib.bib4)\], biological networks, financial transaction networks\[[19](https://arxiv.org/html/2606.18317#bib.bib6)\], and citation graphs\[[21](https://arxiv.org/html/2606.18317#bib.bib7)\]\. By integrating deep learning with graph data, GNNs achieve strong performance in node classification, graph classification, and link prediction\. Most GNNs are built upon message passing neural networks \(MPNNs\), which update node embeddings by aggregating neighborhood information\. Representative models include GCN\[[12](https://arxiv.org/html/2606.18317#bib.bib10)\], SGC\[[20](https://arxiv.org/html/2606.18317#bib.bib3)\], ChebNet\[[6](https://arxiv.org/html/2606.18317#bib.bib9)\], TAGCN\[[8](https://arxiv.org/html/2606.18317#bib.bib12)\], and JKNet\[[22](https://arxiv.org/html/2606.18317#bib.bib8)\], covering techniques from spectral convolution to skip\-connection aggregation\. While deep neural networks generally benefit from more layers, GNNs suffer from over\-smoothing when the number of layers is large\[[14](https://arxiv.org/html/2606.18317#bib.bib20)\], with optimal performance often observed within 2–4 hops\[[13](https://arxiv.org/html/2606.18317#bib.bib18)\]\. Techniques such as DropEdge\[[16](https://arxiv.org/html/2606.18317#bib.bib19)\]and DropNode, inspired by dropout\[[18](https://arxiv.org/html/2606.18317#bib.bib24)\], attempt to mitigate this issue, but random edge/node removal may damage graph structure\. Another challenge is the information bottleneck\[[1](https://arxiv.org/html/2606.18317#bib.bib22)\], where long\-range dependencies are poorly captured\. Moreover, many GNNs assume simple unweighted, undirected graphs, overlooking richer structural patterns such as self\-loops and multi\-edges\. To address these limitations, we propose theK\-Hop Gaussian \(KHG\) diffusion kernel, which introduces multi\-hop diffusion with Gaussian weighting\. This design balances local and global propagation, suppresses noise from distant nodes, and improves robustness compared with existing diffusion kernels such as Personalized PageRank \(PPR\)\[[13](https://arxiv.org/html/2606.18317#bib.bib18)\]and the Heat Kernel\.

Our main contributions are threefold:\(1\) We propose the KHG diffusion kernel to mitigate over\-smoothing and information bottlenecks in deep GNNs\. \(2\) KHG integrates multi\-hop diffusion via Gaussian weighting and is a modular, plug\-and\-play preprocessing component for existing GNNs\. \(3\) Extensive experiments on node and graph classification benchmarks demonstrate its superiority over PPR and Heat kernels\.

![Refer to caption](https://arxiv.org/html/2606.18317v1/x1.png)Fig\. 1:Illustration of the K\-Hop Gaussian Diffusion process withK=2K=2\. The dark blue node denotes the diffusion source, blue nodes correspond to the 1\-hop neighborhood, and light blue nodes represent the 2\-hop neighborhood\. After diffusion, edges are ranked by their diffusion weights, and only the top\-weighted edges are retained\. This procedure is repeated for all nodes, and the resulting diffusion graphs are finally merged to construct the K\-Hop Gaussian Diffusion graph\.
## 2RELATED WORK

Different GNN and Message Passing:In GNNs, classical models such as GCN\[[12](https://arxiv.org/html/2606.18317#bib.bib10)\], SGC\[[20](https://arxiv.org/html/2606.18317#bib.bib3)\], and ChebNet\[[6](https://arxiv.org/html/2606.18317#bib.bib9)\]aggregate only 1\-hop neighbors, limiting long\-range modeling, while excessive multi\-hop diffusion leads to over\-smoothing\. GraphSAGE\[[10](https://arxiv.org/html/2606.18317#bib.bib29)\]is a full GNN model that aggregates information from 1\-hop and 2\-hop neighborhoods to balance computational cost and over\-smoothing\. Its hop is typically restricted to 2 because larger neighborhoods can introduce noise and increase model complexity\. In contrast, our K\-Hop Gaussian \(KHG\) diffusion is a preprocessing module rather than a full GNN, which allows flexible selection of larger K\-hop ranges to propagate information while controlling contribution from distant nodes via Gaussian weighting\. Diffusion kernels generalize this idea: PPR\[[13](https://arxiv.org/html/2606.18317#bib.bib18)\]uses personalized random walks, and the Heat Kernel models heat flow, both lacking explicit hop\-wise control\. GRAND\[[4](https://arxiv.org/html/2606.18317#bib.bib34)\]improves flexibility but requires iterative or sampling steps\. KHG achieves efficient, noise\-robust, and deterministic multi\-hop diffusion in closed form\.Diffusion Mechanism:In GNNs, information propagation is key to learning node relationships\. Classical models such as GCN\[[12](https://arxiv.org/html/2606.18317#bib.bib10)\], SGC\[[20](https://arxiv.org/html/2606.18317#bib.bib3)\], and ChebNet\[[6](https://arxiv.org/html/2606.18317#bib.bib9)\]rely on 1\-hop aggregation, limiting long\-range modeling, while excessive multi\-hop diffusion leads to over\-smoothing\. Methods like PPR\[[13](https://arxiv.org/html/2606.18317#bib.bib18)\]and the Heat Kernel capture broader context but lack explicit hop\-wise control\. GRAND\[[4](https://arxiv.org/html/2606.18317#bib.bib34)\]enhance flexibility via learnable or continuous diffusion, yet incur iterative or sampling overhead\. Our K\-Hop Gaussian \(KHG\) diffusion provides a closed\-form alternative: a Gaussian\-decayed multi\-hop kernel that smoothly balances local and global propagation while remaining efficient and noise\-robust\.Different GNN and Message Passing:Gaussian filtering is a classic denoising technique in image processing, also widely used in deep vision models\[[11](https://arxiv.org/html/2606.18317#bib.bib17)\]\. Inspired by spectral filtering on graphs\[[3](https://arxiv.org/html/2606.18317#bib.bib23)\], we extend it to GNNs to control multi\-hop propagation\. In images, the Gaussian kernel for a pixel\(x,y\)\(x,y\)isG\(x,y\)=12πσ2exp⁡\(−x2\+y22σ2\),G\(x,y\)=\\frac\{1\}\{2\\pi\\sigma^\{2\}\}\\exp\\\!\\left\(\-\\frac\{x^\{2\}\+y^\{2\}\}\{2\\sigma^\{2\}\}\\right\),whereσ\\sigmacontrols the smoothing scale\. Largeσ\\sigmayields broad filtering, while smallσ\\sigmapreserves fine details\. Analogously, in graphs we replace spatial distance with hop distanceii, yielding the weightw\(i,σ\)=exp⁡\(−i22σ2\)w\(i,\\sigma\)=\\exp\\\!\\left\(\-\\frac\{i^\{2\}\}\{2\\sigma^\{2\}\}\\right\),which decays smoothly withii\. This bridges local and global propagation: smallσ\\sigmaenforces local smoothing, while largerσ\\sigmaincorporates broader context, mitigating over\-smoothing in deep GNNs\.

![Refer to caption](https://arxiv.org/html/2606.18317v1/x2.png)Fig\. 2:K\-Hop Gaussian Diffusion Model Performance Comparison\.
## 3METHOD

### 3\.1Graph Normalization and Transition Matrix

K\-Hop Gaussian \(KHG\) Diffusion is a Gaussian kernel\-based method that enables multi\-hop propagation in GNNs, ensuring that information decays smoothly with distance to mitigate over\-smoothing and noise\. The input is the adjacency matrixA∈ℝN×NA\\in\\mathbb\{R\}^\{N\\times N\}, whereAij=1A\_\{ij\}=1if an edge exists between nodesiiandjj, and0otherwise\. The goal is to construct a diffusion matrix that balances local smoothness and long\-range dependencies\. A key step is normalizingAAto obtain a stable transition matrixTT\. Without normalization, degree imbalance can cause uneven propagation\. Two widely used forms are:Symmetric normalization:Tsym=D−12AD−12T\_\{\\mathrm\{sym\}\}=D^\{\-\\tfrac\{1\}\{2\}\}AD^\{\-\\tfrac\{1\}\{2\}\}, whereDii=∑jAijD\_\{ii\}=\\sum\_\{j\}A\_\{ij\}\. This form balances both incoming and outgoing degrees, as in GCN\[[12](https://arxiv.org/html/2606.18317#bib.bib10)\]\.Column normalization:Tcol=D−1AT\_\{\\mathrm\{col\}\}=D^\{\-1\}A, which distributes a node’s information evenly among its neighbors and prevents excessive dilution for low\-degree nodes\. BothTsymT\_\{\\mathrm\{sym\}\}andTcolT\_\{\\mathrm\{col\}\}serve as the base operator for multi\-hop propagation \(TkT^\{k\}\), later combined with Gaussian weights in KHG diffusion\. Compared to unnormalizedAA, these normalized forms guarantee stability and preserve structural balance in heterogeneous graphs\.

### 3\.2K\-Hop Gaussian Diffusion Mechanism

LetA∈ℝN×NA\\in\\mathbb\{R\}^\{N\\times N\}be the adjacency matrix andD=diag\(di\)D=\\mathrm\{diag\}\(d\_\{i\}\)the degree matrix\. We first form a normalized transition operatorTT\(we use either symmetric or column normalization\):Tsym=D−1/2AD−1/2,Tcol=D−1A\.T\_\{\\mathrm\{sym\}\}=D^\{\-1/2\}AD^\{\-1/2\},T\_\{\\mathrm\{col\}\}=D^\{\-1\}A\.Theii\-hop transition operator is theii\-th power ofTTand satisfies the recursion

T\(1\)=T,T\(i\)=T\(i−1\)T\(i≥2\),T^\{\(1\)\}=T,\\qquad T^\{\(i\)\}=T^\{\(i\-1\)\}T\\quad\(i\\geq 2\),\(1\)or element\-wiseTuv\(i\)=∑kTukTkv\(i−1\)T^\{\(i\)\}\_\{uv\}=\\sum\_\{k\}T\_\{uk\}\\,T^\{\(i\-1\)\}\_\{kv\}, which encodes reachability and aggregated propagation withiniihops\. To control hop\-wise influence we introduce a Gaussian hop\-weight

w\(i,σ\)=exp⁡\(−i22σ2\),i=1,…,K,w\(i,\\sigma\)=\\exp\\\!\\left\(\-\\frac\{i^\{2\}\}\{2\\sigma^\{2\}\}\\right\),\\qquad i=1,\\dots,K,\(2\)whereσ\>0\\sigma\>0is the scale andKKthe maximal hop cutoff\. The unnormalized multi\-hop kernel aggregates weighted powers:D~K=∑i=1Kw\(i,σ\)T\(i\)\.\\widetilde\{D\}\_\{K\}\\;=\\;\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\\,T^\{\(i\)\}\.Normalizing by the scalar weight sumZ\(σ\)=∑i=1Kw\(i,σ\),Z\(\\sigma\)=\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\),we obtain the final diffusion kernel

DK=D~KZ\(σ\)=∑i=1Kw\(i,σ\)T\(i\)∑i=1Kw\(i,σ\)\.D\_\{K\}\\;=\\;\\frac\{\\widetilde\{D\}\_\{K\}\}\{Z\(\\sigma\)\}\\;=\\;\\frac\{\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\\,T^\{\(i\)\}\}\{\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\}\.\(3\)IfT=UΛU−1T=U\\Lambda U^\{\-1\}\(orthogonalUUwhenTTis symmetric\), then

T\(i\)=UΛiU−1,D~K=U\(∑i=1Kw\(i,σ\)Λi\)U−1,T^\{\(i\)\}=U\\Lambda^\{i\}U^\{\-1\},\\qquad\\widetilde\{D\}\_\{K\}=U\\Big\(\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\\Lambda^\{i\}\\Big\)U^\{\-1\},soDKD\_\{K\}acts as a polynomial spectral filter

DK=Udiag\(g\(λj\)\)U−1,g\(λ\)=∑i=1Kw\(i,σ\)λi∑i=1Kw\(i,σ\)\.D\_\{K\}=U\\,\\mathrm\{diag\}\\\!\\big\(g\(\\lambda\_\{j\}\)\\big\)\\,U^\{\-1\},\\qquad g\(\\lambda\)=\\frac\{\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\\lambda^\{i\}\}\{\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\}\.\(4\)Thus KHG is a degree\-KKpolynomial filter with Gaussian coefficients\. For comparison, PPR corresponds to geometric weightsDPPR=α∑i≥0\(1−α\)iTi=α\(I−\(1−α\)T\)−1D\_\{\\mathrm\{PPR\}\}=\\alpha\\sum\_\{i\\geq 0\}\(1\-\\alpha\)^\{i\}T^\{i\}=\\alpha\(I\-\(1\-\\alpha\)T\)^\{\-1\}and heat diffusion to factorial/exponential seriesexp⁡\(tT\)=∑i≥0ti/i\!Ti\\exp\(tT\)=\\sum\_\{i\\geq 0\}t^\{i\}/i\!\\,T^\{i\}\. BuildingDKD\_\{K\}explicitly may densify; instead computeDKXD\_\{K\}Xfor a feature matrixX∈ℝN×dX\\in\\mathbb\{R\}^\{N\\times d\}via recursion\. LetY\(0\)=XY^\{\(0\)\}=XandY\(i\)=TY\(i−1\),i=1,…,K,Y^\{\(i\)\}=T\\,Y^\{\(i\-1\)\},\\quad i=1,\\dots,K,then

DKX=∑i=1Kw\(i,σ\)Y\(i\)∑i=1Kw\(i,σ\)\.D\_\{K\}X\\;=\\;\\frac\{\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\\,Y^\{\(i\)\}\}\{\\sum\_\{i=1\}^\{K\}w\(i,\\sigma\)\}\.\(5\)This avoids storingT\(i\)T^\{\(i\)\}and leads to the simple iterative procedure: iterateKKsparse multiplicationsY←TYY\\leftarrow TY, accumulatew\(i,σ\)Yw\(i,\\sigma\)Yand divide byZ\(σ\)Z\(\\sigma\)\. when implemented at feature\-level \(sparseTT\)\. Explicit matrix construction can densify toO\(N2\)O\(N^\{2\}\)storage in the worst case\. In practice use sparse accumulation and optional top\-mmrow\-wise sparsification to keepDKD\_\{K\}sparse\. Parameter semantics are intuitive: smallσ\\sigmaconcentrates mass on low\-order powers \(local smoothing\), largeσ\\sigmaspreads mass across hops \(more global aggregation\);KKlimits maximal propagation depth\.

## 4Experiment

Table 1:Performance Comparison of GNN Models with Different Diffusion Mechanisms across Datasets### 4\.1Dataset and Baselines

In our experiments, we evaluated the proposed K\-Hop Gaussian \(KHG\) diffusion against a range of representative GNN baselines, including GCN\[[12](https://arxiv.org/html/2606.18317#bib.bib10)\], SGC\[[20](https://arxiv.org/html/2606.18317#bib.bib3)\], ChebNet\[[6](https://arxiv.org/html/2606.18317#bib.bib9)\], TAGCN\[[8](https://arxiv.org/html/2606.18317#bib.bib12)\], JKNet\[[22](https://arxiv.org/html/2606.18317#bib.bib8)\], as well as advanced diffusion methods GRAND\[[4](https://arxiv.org/html/2606.18317#bib.bib34)\]\. For node classification, we used three benchmark citation networks: Cora , CiteSeer, Pubmed\[[15](https://arxiv.org/html/2606.18317#bib.bib31)\]and Chameleon\[[17](https://arxiv.org/html/2606.18317#bib.bib27)\]\. For graph classification, we considered three molecular and protein datasets: ENZYMES\[[2](https://arxiv.org/html/2606.18317#bib.bib28)\], MUTAG\[[5](https://arxiv.org/html/2606.18317#bib.bib30)\], and PROTEINS\[[7](https://arxiv.org/html/2606.18317#bib.bib32)\]\. Datasets without official splits \(e\.g\., Chameleon, ENZYMES, MUTAG, PROTEINS\) were randomly divided into training/validation/test sets in a 7:2:1 ratio using a fixed random seed of 42 to ensure reproducibility\. Each experiment was repeated multiple times, and results are reported with 95% confidence intervals to ensure stability and robustness\.

### 4\.2Performance Comparison

Table I reports results of four diffusion mechanisms \(None, PPR, Heat, KHG\) on three node classification datasets \(Cora, CiteSeer, Chameleon\) and three graph classification datasets \(MUTAG, ENZYMES, PROTEINS\) combined with five GNN models \(GCN, SGC, ChebNet, TAGCN, JKNet\)\. Overall, KHG consistently outperforms other kernels\. On Cora, for instance, GCN\+KHG achieved 82\.0%, surpassing both None \(80\.8%\) and PPR/Heat\. On Chameleon, ChebNet\+KHG reached 48\.9%, 3\.7% higher than None\. In graph classification, the improvement is even larger: ChebNet\+KHG achieved 93\.0% on MUTAG, 7\.9% above None, and GCN\+KHG gained 5\.0% on ENZYMES\. These results demonstrate that Gaussian weighting effectively enhances robustness in both sparse and complex graphs\. The choice ofKKstrongly influences performance\. For smaller, sparse graphs \(Cora, MUTAG\), smallKK\(<10<10\) is optimal, while for denser or heterogeneous graphs \(Chameleon, ENZYMES\) moderateKK\(\>10\>10\) yields better accuracy by capturing more global information without excessive noise\. This trend highlights KHG’s adaptability to different graph scales and structures\.

![Refer to caption](https://arxiv.org/html/2606.18317v1/x3.png)Fig\. 3:Effect ofσ\\sigmaon accuracy \(Cora dataset, fixedKK\)\.Efficiency\.Unlike PPR and Heat, which rely on iterative power series or matrix exponentiation, KHG is a closed\-form preprocessing step\. On PubMed \(19k nodes, 88k edges, 500\-dimensional features\), constructingDKXD\_\{K\}XwithK=14K=14takes about1\.81\.8s and 180MB memory\. In contrast, PPR withα=0\.1\\alpha=0\.1requires∼\\sim6\.5s due to iterative convergence, while Heat needs to store multiple dense expansions, consuming over 400MB\. This demonstrates that KHG achieves comparable or better accuracy with significantly lower computational overhead, making it scalable to large graphs\.

Influence ofσ\\sigmaon KHG\.As shown in Fig\.[3](https://arxiv.org/html/2606.18317#S4.F3), smallσ\\sigmavalues cause rapid weight decay, suppressing distant information and leading to poor accuracy\. Asσ\\sigmaincreases, performance improves since remote neighbors contribute more\. Most models achieve the best results whenσ∈\[1\.0,3\.0\]\\sigma\\in\[1\.0,3\.0\], with JKNet and TAGCN peaking aroundσ=1\.5\\sigma=1\.5, showing that Gaussian weighting balances local and global propagation\.

## 5CONCLUSION

This paper proposes a new K\-Hop Gaussian \(KHG\) diffusion mechanism, which is a data preprocessing method for graphs\. Aiming to solve the shortcomings of existing graph neural networks \(GNN\) in processing complex graph structures and noisy data\. By introducing Gaussian\-weighted multi\-hop diffusion, this method effectively balances the propagation of local and global information and avoids excessive smoothing and information bottleneck problems in deep networks\. The KHG diffusion kernel can not only suppress the noise impact of distant nodes but also enhance the feature retention of low\-degree nodes, thereby improving the performance of the overall model\. Our experiments show that this method significantly improves the performance of baseline GNNs\. Future work will be devoted to developing more efficient diffusion algorithms to apply this method on larger\-scale graphs and explore its potential in other graph learning tasks\.

## 6Acknowledgment

This work was supported in part by National Natural Science Foundation of China \(Grant No\. U25A2054\), in part by Shenzhen Science and Technology Projects \(Grant No\. JSGGZD20220822095602005\)\.

## References

- \[1\]\(2020\)On the bottleneck of graph neural networks and its practical implications\.InProceedings of the 8th International Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1)\.
- \[2\]K\. M\. Borgwardt and H\.\-P\. Kriegel\(2005\)Shortest\-path kernels on graphs\.Proceedings of the 5th IEEE International Conference on Data Mining \(ICDM\),pp\. 74–81\.Cited by:[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[3\]J\. Bruna, W\. Zaremba, A\. Szlam, and Y\. LeCun\(2014\)Spectral networks and locally connected networks on graphs\.InProceedings of the International Conference on Learning Representations \(ICLR\),Cited by:[§2](https://arxiv.org/html/2606.18317#S2.p1.10)\.
- \[4\]B\. P\. Chamberlain, J\. Rowbottom, M\. Gorinova, S\. Webb, E\. Rossi, and M\. Bronstein\(2021\)Grand: graph neural diffusion\.International Conference on Machine Learning \(ICML\),pp\. 1407–1418\.Cited by:[§2](https://arxiv.org/html/2606.18317#S2.p1.10),[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[5\]A\. K\. Debnath, R\. L\. L\. de Compadre, G\. Debnath, A\. J\. Shusterman, and C\. Hansch\(1991\)Structure\-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds: correlation with molecular orbital energies and hydrophobicity\.Journal of Medicinal Chemistry34\(2\),pp\. 786–797\.Cited by:[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[6\]M\. Defferrard, X\. Bresson, and P\. Vandergheynst\(2016\)Convolutional neural networks on graphs with fast localized spectral filtering\.InProceedings of the 30th Conference on Neural Information Processing Systems \(NeurIPS\),pp\. 3844–3852\.Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1),[§2](https://arxiv.org/html/2606.18317#S2.p1.10),[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[7\]P\. D\. Dobson and A\. J\. Doig\(2003\)Distinguishing enzyme structures from non\-enzymes without alignments\.Journal of Molecular Biology330\(4\),pp\. 771–783\.Cited by:[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[8\]J\. Du, S\. Zhang, G\. Wu, J\. M\. F\. Moura, and S\. Kar\(2018\)Topology adaptive graph convolutional networks\.InProceedings of the International Conference on Learning Representations \(ICLR\),Note:Conference Blind Submission, initially submitted on 16 Feb 2018, last modified on 24 Jan 2023Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1),[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[9\]J\. Gilmer, S\. Schoenholz, P\. Riley, O\. Vinyals, and G\. Dahl\(2017\)Neural message passing for quantum chemistry\.InProceedings of the 34th International Conference on Machine Learning \(ICML\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1)\.
- \[10\]W\. L\. Hamilton, Z\. Ying, and J\. Leskovec\(2017\)Inductive representation learning on large graphs\.InAdvances in Neural Information Processing Systems \(NeurIPS\),pp\. 1024–1034\.Cited by:[§2](https://arxiv.org/html/2606.18317#S2.p1.10)\.
- \[11\]K\. He, X\. Zhang, S\. Ren, and J\. Sun\(2016\)Deep residual learning for image recognition\.InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 770–778\.Cited by:[§2](https://arxiv.org/html/2606.18317#S2.p1.10)\.
- \[12\]T\. N\. Kipf and M\. Welling\(2017\)Semi\-supervised classification with graph convolutional networks\.InProceedings of the 5th International Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1),[§2](https://arxiv.org/html/2606.18317#S2.p1.10),[§3\.1](https://arxiv.org/html/2606.18317#S3.SS1.p1.14),[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[13\]J\. Klicpera, A\. Bojchevski, and S\. Günnemann\(2019\)Predict then propagate: graph neural networks meet personalized pagerank\.InProceedings of the 7th International Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1),[§2](https://arxiv.org/html/2606.18317#S2.p1.10)\.
- \[14\]Q\. Li, Z\. Han, and X\. Wu\(2020\)Deeper insights into graph convolutional networks for semi\-supervised learning\.InProceedings of the 32nd AAAI Conference on Artificial Intelligence \(AAAI\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1)\.
- \[15\]G\. Namata, S\. Kok, and L\. Getoor\(2012\)Query\-driven active surveying for collective classification\.Cited by:[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[16\]Y\. Rong, W\. Huang, T\. Xu, and J\. Huang\(2019\)DropEdge: towards deep graph convolutional networks on node classification\.InProceedings of the 8th International Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1)\.
- \[17\]B\. Rozemberczki, C\. Allen, and R\. Sarkar\(2021\)Multi\-scale attributed node embeddings\.Journal of Complex Networks9\(2\),pp\. cnab014\.Cited by:[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[18\]N\. Srivastava, G\. Hinton, A\. Krizhevsky, I\. Sutskever, and R\. Salakhutdinov\(2014\)Dropout: a simple way to prevent neural networks from overfitting\.Journal of Machine Learning Research15,pp\. 1929–1958\.Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1)\.
- \[19\]F\. Wang and C\. Ma\(2021\)Financial fraud detection using graph neural networks: a case study of transaction networks\.IEEE Transactions on Knowledge and Data Engineering33\(7\),pp\. 1237–1249\.Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1)\.
- \[20\]F\. Wu, A\. Souza, T\. Zhang, C\. Fifty, T\. Yu, and K\. Q\. Weinberger\(2019\)Simplifying graph convolutional networks\.InProceedings of the 36th International Conference on Machine Learning \(ICML\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1),[§2](https://arxiv.org/html/2606.18317#S2.p1.10),[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
- \[21\]K\. Xu, W\. Hu, J\. Leskovec, and S\. Jegelka\(2018\)How powerful are graph neural networks?\.InProceedings of the 7th International Conference on Learning Representations \(ICLR\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1)\.
- \[22\]K\. Xu, C\. Li, Y\. Tian, T\. Sonobe, K\. Kawarabayashi, and S\. Jegelka\(2018\)Representation learning on graphs with jumping knowledge networks\.InProceedings of the 35th International Conference on Machine Learning \(ICML\),Cited by:[§1](https://arxiv.org/html/2606.18317#S1.p1.1),[§4\.1](https://arxiv.org/html/2606.18317#S4.SS1.p1.1)\.
Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion

Similar Articles

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation

DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection

Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

HyperPatch: Sequential Knowledge Editing Under n-ary Structural Drift

Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model

Submit Feedback

Similar Articles

Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation
DDGAD: Trajectory Dynamics for Diffusion-Based Graph Anomaly Detection
Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models
HyperPatch: Sequential Knowledge Editing Under n-ary Structural Drift
Equilibrium Propagation and Hamiltonian Inference in the Diffusive Fitzhugh-Nagumo Model