Evolutionary Refinement of Generative Graph Topologies: A Hybrid WGAN-GA Approach

arXiv cs.LG Papers

Summary

This paper proposes a hybrid WGAN-GA approach for refining generative graph topologies, using a genetic algorithm to correct residual structural deviations in GAN-based generated graphs, improving realism for synthetic graph synthesis and data augmentation.

arXiv:2605.29161v1 Announce Type: new Abstract: Generating realistic graph-structured data is challenging due to discrete connectivity, varying graph sizes, and class-specific structural patterns. Recent Generative Adversarial Networks (GAN)-based graph generation methods improve edge modelling by learning connectivity and matching class-specific density distributions. However these models still exhibit noticeable deviations such as in degree and spectral distribution when compared to real graphs, indicating that important structural properties are not fully preserved. This work aims to reduce these deviations by refining the graphs produced by an existing GAN-based graph generator framework with a Genetic Algorithm (GA). In the GAN framework, the generator produces both node features and connectivity patterns, while a GNN-based critic evaluates graph realism and class consistency to ensure global structural and class alignment. Building on this foundation, we apply a GA to refine the edges of generated graphs. The refinement process guides synthetic graphs toward closer agreement with real data, while preserving diversity and novelty. Experimental results show that the GA refinement consistently lowers combined Maximum Mean Discrepancy (MMD) compared to the base model, leading to graphs that more closely match real structural patterns. This demonstrates that evolutionary refinement is an effective and flexible way to correct residual structural deviations in GAN-based graph generators, improving their suitability for realistic graph synthesis and data augmentation.
Original Article
View Cached Full Text

Cached at: 05/29/26, 09:18 AM

# Evolutionary Refinement of Generative Graph Topologies: A Hybrid WGAN-GA Approach
Source: [https://arxiv.org/html/2605.29161](https://arxiv.org/html/2605.29161)
###### Abstract

Generating realistic graph\-structured data is challenging due to discrete connectivity, varying graph sizes, and class\-specific structural patterns\. Recent Generative Adversarial Networks \(GAN\)\-based graph generation methods improve edge modelling by learning connectivity and matching class\-specific density distributions\. However these models still exhibit noticeable deviations such as in degree and spectral distribution when compared to real graphs, indicating that important structural properties are not fully preserved\. This work aims to reduce these deviations by refining the graphs produced by an existing GAN\-based graph generator framework with a Genetic Algorithm \(GA\)\. In the GAN framework, the generator produces both node features and connectivity patterns, while a GNN\-based critic evaluates graph realism and class consistency to ensure global structural and class alignment\. Building on this foundation, we apply a GA to refine the edges of generated graphs\. The refinement process guides synthetic graphs toward closer agreement with real data, while preserving diversity and novelty\. Experimental results show that the GA refinement consistently lowers combined Maximum Mean Discrepancy \(MMD\) compared to the base model, leading to graphs that more closely match real structural patterns\. This demonstrates that evolutionary refinement is an effective and flexible way to correct residual structural deviations in GAN\-based graph generators, improving their suitability for realistic graph synthesis and data augmentation\.

## IIntroduction

Graph\-structured data arises in many application areas, including social networks, molecular graphs, biological systems, and communication networks, where nodes and edges capture important information about underlying processes and system properties\. The generation of realistic synthetic graphs has become increasingly relevant for applications such as privacy\-preserving data sharing and data augmentation for graph\-based machine learning models\. However, generating graphs introduces fundamental challenges that differ from those of traditional data synthesis in Euclidean spaces\. Unlike images or text, which have regular grid structures and fixed dimensionality, graphs exhibit irregular topologies, variable sizes, and no canonical node ordering\. As a result, effective graph generation methods must learn complex dependencies between graph structure and node attributes while preserving the structural properties that define different graph types\.

Early deep learning approaches to graph generation have explored various strategies to address these challenges\. Methods such as DeepGMG\[[7](https://arxiv.org/html/2605.29161#bib.bib7)\]and GraphRNN\[[14](https://arxiv.org/html/2605.29161#bib.bib14)\]model graphs through sequential generation processes, while GraphVAE\[[11](https://arxiv.org/html/2605.29161#bib.bib11)\]leverages variational autoencoders to learn latent graph representations\. Generative Adversarial Networks \(GANs\) have shown particular promise, with GraphGAN\[[12](https://arxiv.org/html/2605.29161#bib.bib12)\], EGraphGAN\[[13](https://arxiv.org/html/2605.29161#bib.bib13)\], and MolGAN\[[3](https://arxiv.org/html/2605.29161#bib.bib3)\]demonstrating the potential of adversarial training for graph synthesis\. LGGAN\[[5](https://arxiv.org/html/2605.29161#bib.bib5)\]further advanced the field by introducing labeled graph generation with comprehensive evaluation metrics including Maximum Mean Discrepancy \(MMD\) over degree, clustering, and orbit statistics\. However, these GAN\-based approaches often suffer from training instability, mode collapse, and difficulties in generating graphs with variable sizes\.

The introduction of Wasserstein GANs \(WGANs\)\[[1](https://arxiv.org/html/2605.29161#bib.bib1)\]marked a significant advancement in addressing the instability issues inherent in standard GANs\. WGAN\-GP\[[6](https://arxiv.org/html/2605.29161#bib.bib6)\]further improved upon this by enforcing the Lipschitz constraint through gradient penalty rather than weight clipping, enabling more effective training and better sample diversity\. Recent work has successfully adapted the WGAN framework to graph generation by combining it with Graph Neural Networks \(GNNs\) as critics\. In\[[9](https://arxiv.org/html/2605.29161#bib.bib9),[10](https://arxiv.org/html/2605.29161#bib.bib10)\], a generator produces both node features and connectivity patterns, while a GNN\-based critic evaluates graph realism and class consistency\. These methods demonstrated superior performance in generating class\-conditional graphs with improved training stability\. However, despite these advances, these works show that these models still exhibit noticeable deviations such as in the degree spectral distribution when compared to real graphs\. This limitation results in graphs with unrealistic connectivity patterns that fail to capture the complex structural dependencies between nodes and do not reflect the learned node feature distributions\.

We address these core limitations by extending existing graph generation approaches with a Genetic Algorithm \(GA\)–based refinement stage\. After the graph generator is fully trained, we apply a GA to refine the generated graphs through crossover and mutation operations that directly alter graph structure, such as edge presence and local connectivity patterns\. The refinement process is guided by fitness measures derived from structural statistics of real graphs, with a particular focus on correcting deviations in degree, clustering coefficient and spectral distributions\.

Central to our refinement strategy is the concept of evolutionary edge editing, where graph topologies are iteratively optimized through a sequence of discrete structural modifications\. This approach builds upon established generative representations that utilize elementary operations, such as adding, removing, or toggling edges, to evolve networks toward specific structural or functional targets\[[2](https://arxiv.org/html/2605.29161#bib.bib2),[4](https://arxiv.org/html/2605.29161#bib.bib4)\]\. By leveraging these direct edge manipulations, the evolutionary stage can enforce precise local constraints and correct statistical deviations that are often smoothed over by the continuous approximations inherent in deep generative models\.

Experiments were conducted on three bioinformatics graph benchmark datasets from\[[8](https://arxiv.org/html/2605.29161#bib.bib8)\]\. The results demonstrate that the proposed refinement strategy improves the alignment between generated graphs and the statistical properties of the training data\. The refined graphs show more coherent and realistic structural patterns, with connectivity that better reflects the relationships encoded in node features\. Overall, the GA refinement reduces structural mismatches while maintaining variability, leading to higher\-quality synthetic graphs suitable for downstream graph learning tasks\.

Data and Code:These are both available at:github\.com/shorinbonsai/WGAN\-GA\-Refine\.

![Refer to caption](https://arxiv.org/html/2605.29161v1/james.jpg)Figure 1:Model overview\. Phase 1: GAN training \(based on\[[9](https://arxiv.org/html/2605.29161#bib.bib9),[10](https://arxiv.org/html/2605.29161#bib.bib10)\]\)\. The generator learns to place nodes in a latent space where closer nodes are more likely to be connected\. A GNN\-based critic processes graphs using several convolution layers, pools node features, and combines them with class embeddings to compute Wasserstein scores\. Phase 2: Genetic Algorithm \(GA\) refinement\. After training, the generator produces synthetic graphs that are then refined using a GA that produces graphs closely resembling real samples\. The fitness function encourages the generated graphs to match the real data distribution and target class while remaining distinct and novel\.
## IIMethodology: Evolutionary Refinement

Our framework uses the output of a deep generative model as input to evolutionary refinement to produce precise graph structures\. These stages are shown in Figure[1](https://arxiv.org/html/2605.29161#S1.F1): \(1\) a coarse generation stage using a Wasserstein Generative Adversarial Network \(WGAN\) \(based on\[[9](https://arxiv.org/html/2605.29161#bib.bib9),[10](https://arxiv.org/html/2605.29161#bib.bib10)\]\) and \(2\) a refinement stage using a Genetic Algorithm \(GA\) library for edge editing developed in Rust\. We note that the focus of this paper is the evolutionary refinement stage, and that the coarse generation stage could use any method that produces graph structures\. Although this method can be applied to extend existing GAN\-based graph generation models, we use WGAN as a representative example due to its strong reported performance\.

### II\-ACoarse Generation: WGAN

WGAN generates coarse graph candidates that capture the global structural characteristics of the target graph distribution\. Graphs are represented as adjacency matrices, with the generator mapping latent samples to continuous adjacency estimates\. The critic is trained using the Wasserstein distance with a Lipschitz constraint, which provides a smoother and more stable training signal than standard GANs\.

The primary role of the WGAN in our system is not to produce perfectly valid graphs, but to capture high\-level statistical regularities observed in the training data, such as edge density, degree patterns, and broad spectral or clustering properties\. These coarse outputs serve as informed initializations for the evolutionary stage\. By starting the GA\-based refinement from WGAN\-generated candidates instead of random graphs, the search space explored during refinement is biased toward structurally plausible regions, improving convergence speed and solution quality\. As a result, the WGAN provides a global initialization of graph structure, while detailed refinement and structural optimization are handled by the evolutionary stage\. The WGAN implementation is based on the code from\[[10](https://arxiv.org/html/2605.29161#bib.bib10)\]\.

### II\-BFine\-Tuning: Evolutionary Refinement

While GAN\-based models capture global structural trends, they often struggle to satisfy precise local constraints, such as exact degree distributions or triangle counts\. To address this limitation, we refine the generated graphs using a GA\.

In this work, we provide a Rust\-based GA library that integrates WGAN\-based graph generation with a GA refinement module\. The Python environment interfaces with our library to pass graphs generated by the WGAN directly to the GA initialization stage\. The GA then iteratively refines graph structures by optimizing a fitness function defined over selected structural metrics \(e\.g\. clustering coefficient, spectral gap\), that are also reflected in the WGAN training objective\.

### II\-CRepresentation and Initialization

To effectively bridge the continuous latent space of the WGAN and the discrete search space of the graph refiner, we utilize a dual\-representation strategy\. This approach distinguishes between thephenotype\(the actual graph structure\) and thegenotype\(the evolutionary instruction set\)\.

#### II\-C1Graph Phenotype

The phenotype represents the physical graph structure utilized for fitness evaluation\. It is implemented in the GA library as an adjacency list\.

#### II\-C2Command String Genotype

Our system employs a linear command\-based genotype\. Each individual in the population is defined by a genomegg, a sequence of genes that each encode a specific operation\. The final graph is produced by an expression functionΦ\\Phi:

Gf​i​n​a​l=Φ​\(Gb​a​s​e,g\)=o​pn​\(…​o​p2​\(o​p1​\(Gb​a​s​e\)\)​…\)G\_\{final\}=\\Phi\(G\_\{base\},g\)=op\_\{n\}\(\\dots op\_\{2\}\(op\_\{1\}\(G\_\{base\}\)\)\\dots\)\(1\)whereGb​a​s​eG\_\{base\}is the coarse graph generated by the WGAN\. This ensures that the GA explores the local neighbourhood of the coarse graph, rather than the entire search space\. The evolutionary search progresses through the manipulation of the linear command genotype\.

### II\-DGenetic Operators and Dynamics

Crucially, our operators do not directly manipulate edges on the adjacency matrix; instead, they manipulate theinstruction sequencethat constructs the graph\.

In particular, the chromosome is a sequence of edge\-editing operations that is applied sequentially; it is thus deterministic and specifies a particular network\. The length of the chromosome is twice the number of vertices, to stay relatively consistent with\[[4](https://arxiv.org/html/2605.29161#bib.bib4)\]while scaling for graphs with different cardinalities\.

#### II\-D1Edge Editing Operations

The operations are applied to a graphG​\(V,E\)G\(V,E\)with nodesuu,vv,ww, andxxfromVV, as follows\.

- •Toggle\(uu,vv\):If edge\{u,v\}\\\{u,v\\\}is inEEthen remove it fromEE, else add it toEE\.LocalToggle\(uu,ww,vv\) requires that edges\{u,w\}\\\{u,w\\\}and\{w,v\}\\\{w,v\\\}exist then callsToggle\(uu,vv\)\.
- •Hop\(uu,vv,ww\):If edges\{u,v\}\\\{u,v\\\}and\{v,w\}\\\{v,w\\\}are inEEand edge\{u,w\}\\\{u,w\\\}is not, then remove\{u,v\}\\\{u,v\\\}fromEEand add\{u,w\}\\\{u,w\\\}toEE\.
- •Add\(uu,vv\):If\{u,v\}\\\{u,v\\\}is not inEEthen add it toEE, else do nothing\.LocalAdd\(uu,ww,vv\)requires that edges\{u,w\}\\\{u,w\\\}and\{w,v\}\\\{w,v\\\}exist then callsAdd\(uu,vv\)\.
- •Delete\(uu,vv\):If\{u,v\}\\\{u,v\\\}is inEEthen remove it fromEE, else do nothing\.LocalDelete\(uu,ww,vv\)requires that edges\{u,w\}\\\{u,w\\\}and\{w,v\}\\\{w,v\\\}exist then callsDelete\(uu,vv\)\.
- •Swap\(uu,vv,ww,xx\):If\{u,v\}\\\{u,v\\\}and\{w,x\}\\\{w,x\\\}are the only edges inEEbetween nodesuu,vv,wwandxxthen remove them fromEEand add\{u,x\}\\\{u,x\\\}and\{v,w\}\\\{v,w\\\}toEE\.
- •Null\(\):Do nothing\.

These are visualized in a sequential manner in Figure[2](https://arxiv.org/html/2605.29161#S2.F2)\. The probabilities of the operations are given in Table[II](https://arxiv.org/html/2605.29161#S3.T2)\. In initial tuning, the probabilities of Toggle, Add, Delete, and Local toggle were all set to half those of the other operations because otherwise the degree distribution was negatively affected\.

0\. Start\(Ring Graph\)0123451\. Add\(0, 3\)Add non\-existent edge0123452\. Toggle\(1, 4\)Edge missing→\\toAdd0123453\. LocalAdd\(2, 3, 4\)Path2→3→42\\to 3\\to 4, Add \(2,4\)0123454\. LocalToggle\(0, 5, 4\)Path0→5→40\\to 5\\to 4, Toggle \(0,4\)0123455\. Hop\(5, 4, 3\)Remove \(5,4\), Add \(5,3\)0123456\. LocalDelete\(0, 1, 4\)Path0→1→40\\to 1\\to 4, Del \(0,4\)0123457\. Delete\(5, 0\)Simple deletion0123458\. Swap\(1, 0, 4, 3\)1−0,4−3→1−3,4−01\-0,4\-3\\to 1\-3,4\-0012345Figure 2:Sequential visualization of the edge editing operations\. Green edges are added; red dashed edges are removed\. The chromosome producing the final graph \(bottom right\) from the start graph \(top left\) thus consists of these 8 operations in sequence\.
#### II\-D2Crossover

We employ Two\-Point Crossover to recombine successful edit strategies from the population\.

This method is preferred over Uniform Crossover for this application as it preserves contiguous subsequences of graph operations\. Since the phenotype is constructed by applying commands sequentially, preserving these functional blocks allows offspring to inherit effective macro\-actions \(e\.g\., a specific sequence of edits that creates a valid substructure\) without disrupting the logic of the generative process\.

#### II\-D3Mutation

Mutation randomly selects 1–4 genes and replaces those with new random genes\. This slightly disruptive operation promotes a deeper exploration of the search space\.

### II\-EFitness Function

The fitness function quantifies the structural deviation between a candidate graphGGand the distribution of the target graphs\.

Since the goal is to generate graphs that statistically resemble a class of real\-world networks rather than reproducing a single instance, we employ MMD\.

To compute the final score, we treat the features of the generated graph as a sample and compare them to the distribution of features in the training set\. The total fitnessFF\(where lower is better\) is calculated as a weighted sum of the MMD scores combined with a penalty for deviation in graph density:

F=\(wd⋅MMDd\)\+\(wc⋅MMDc\)\+\(ws⋅MMDs\)\+PedgeF=\(w\_\{d\}\\cdot\\text\{MMD\}\_\{d\}\)\+\(w\_\{c\}\\cdot\\text\{MMD\}\_\{c\}\)\+\(w\_\{s\}\\cdot\\text\{MMD\}\_\{s\}\)\+P\_\{\\text\{edge\}\}\(2\)wherewd,wc,w\_\{d\},w\_\{c\},andwsw\_\{s\}are configurable weights for the degree, clustering, and spectral components respectively, chosen to be consistent with\[[10](https://arxiv.org/html/2605.29161#bib.bib10)\]\. The edge penalty termPedgeP\_\{\\text\{edge\}\}ensures the graph maintains a realistic density and is defined as

Pedge=we⋅\|\|Eg​e​n\|−Et​a​r​g​e​t\|P\_\{\\text\{edge\}\}=w\_\{e\}\\cdot\|\|E\_\{gen\}\|\-E\_\{target\}\|, where\|Eg​e​n\|\|E\_\{gen\}\|is the current edge count,Et​a​r​g​e​tE\_\{target\}is the desired edge count, andwew\_\{e\}is the penalty weight\.

The GA minimizes this aggregate distance, driving the population toward graphs that are statistically authentic in both local detail and global organization\. The components of the fitness landscape are as follows\.

#### II\-E1Degree Distribution \(MMDd\\text\{MMD\}\_\{d\}\)

To ensure the generated graphs reproduce realistic connectivity patterns, we compute the degree distribution histogram for the candidate graph\. LetHd​\(G\)H\_\{d\}\(G\)be the normalized histogram of node degrees for graphGG, computed using adaptive binning with a static count of ten bins to accommodate varying graph sizes\.MMDd\\text\{MMD\}\_\{d\}measures the distance betweenHd​\(G\)H\_\{d\}\(G\)and the set of degree histograms\. This penalizes graphs that fail to capture the density and distribution of the real data\.

#### II\-E2Clustering Coefficient \(MMDc\\text\{MMD\}\_\{c\}\)

Local substructures are evaluated using the distribution of local clustering coefficients\. For each nodevv, the clustering coefficientCvC\_\{v\}measures the density of triangles in its neighbourhood\.MMDc\\text\{MMD\}\_\{c\}measures the distance between the graph and the data distribution\. This metric is crucial for enforcing transitivity and community structure, properties often missed by simpler models\.

#### II\-E3Spectral Features \(MMDs\\text\{MMD\}\_\{s\}\)

Global structural properties are captured using the graph spectrum\. We compute the eigenvalues of the combinatorial Laplacian matrix𝐋=𝐃−𝐀\\mathbf\{L\}=\\mathbf\{D\}\-\\mathbf\{A\}, where𝐀\\mathbf\{A\}is the adjacency matrix and𝐃\\mathbf\{D\}is the degree matrix\. The eigenvalues are computed and sorted\. This spectrum encodes vital information regarding graph cuts, connectivity, and diffusion properties\. By minimizing the MMD between the spectral densities, the GA ensures that the generated graphs share the same global topology as the target class\.

## IIIExperimental Design

### III\-ADatasets

We assess our graph refinement methods using three benchmark graph classification datasets\[[8](https://arxiv.org/html/2605.29161#bib.bib8)\]that cover diverse domains and structural complexities, detailed in Table[I](https://arxiv.org/html/2605.29161#S3.T1)below\. The MUTAG dataset consists of chemical compound graphs for mutagenicity prediction and is characterized by relatively small graphs with an average of 18 nodes\. ENZYMES represents protein structures distributed across six classes, consisting of moderate\-sized graphs averaging 33 nodes\. Finally, PROTEINS serves as the largest and most structurally varied dataset for distinguishing enzymes from non\-enzymes, with graph sizes spanning from 4 to 620 nodes\. The feature representations differ significantly among these benchmarks: while MUTAG incorporates both 7\-dimensional node and 4\-dimensional edge features, ENZYMES and PROTEINS rely exclusively on 3\-dimensional node features\. Collectively, these datasets provide a rigorous environment for testing graph generation performance across varying scales and complexities\.

TABLE I:Summary of Dataset CharacteristicsTABLE II:Settings for the Evolutionary AlgorithmTABLE III:Comparison of WGAN vs\. GA\-Refined Graphs\. Lower MMD scores indicate better distributional matching\.
### III\-BGA Setup

The hyperparameters used for the evolutionary refinement stage are detailed in Table[II](https://arxiv.org/html/2605.29161#S3.T2)\. The evolutionary process is managed by a Rust backend, which handles population management, fitness evaluation, and genetic operators in parallel\.

#### III\-B1Population Initialization and the Identity Genome

A critical component of our hybrid architecture is the initialization strategy\. Standard evolutionary approaches often initialize populations with random noise; however, our goal is torefinethe structural priors learned by the WGAN, not to generate graphs from scratch\. Consequently, the initialization process in the GA is biased toward the WGAN output\. The population of sizePPis initialized as follows\. First, theIdentity Genome, the first individual in the population, is explicitly constructed as a sequence consisting entirely ofNull\(\) operations; hence no changes are made to the base graph provided by the WGAN\. This guarantees that the evolutionary search begins with a solution that is at least as fit as the original WGAN output, providing a stable baseline for improvement\. The remainingP−1P\-1individuals are each initialized as a random sequence of operations, chosen according to the specified probabilities\. This injects immediate diversity into the population, exploring the local structural neighbourhood of the base graph\.

#### III\-B2Selection and Variation

We employ tournament selection with a tournament size of 5\. This balances exploration with exploitation of high fitness structural edits\. To ensure that high quality refinements are never lost due to stochastic effects, we utilize elitism, preserving the top 2 individuals unchanged into the next generation\.

We utilize high crossover \(0\.5\) and mutation \(0\.8\) probabilities\. This aggressive variation strategy is viable because the operations are applied sequentially to a robust base structure, allowing the algorithm to rapidly test various combinations of edge additions, deletions, and swaps to correct specific spectral deviations and better exploit the search space\.

## IVResults and Discussion

### IV\-APerformance on Benchmark Datasets

Table[III](https://arxiv.org/html/2605.29161#S3.T3)presents a comprehensive study comparing the WGAN generator output against the final GA refined graphs\. The results demonstrate that the evolutionary refinement of the base graphs significantly enhances the structural character, particularly capturing global topology and clustering patterns\.

For the MUTAG dataset, the refinement yielded a dramatic improvement in the Clustering MMD, dropping from values greater than 1\.0 in the base WGAN to near zero for both classes\. This indicates that the GA successfully recovered the specific structures of the generally small graphs in the dataset that the WGAN struggled to capture\. While the Spectral MMD increased slightly, the gains in clustering result and improved Degree MMD suggest more plausible synthetic graphs\.

With the ENZYMES dataset, we see consistent and sometimes substantial improvements in the Spectral MMD, notably in Class 0 \(0\.198 to 0\.034\) and Class 3 \(0\.098 to 0\.021\)\. This suggests that the edge editing operations were successful in adjusting the graph eigenvalues to match the real distribution\. In some of these cases we see a slight trade off, particularly with regards to Degree MMD, as the algorithm prioritized global connectivity over the degree counts\.

The PROTEINS dataset further highlights the strength of the refinement in fixing topological properties\. Spectral MMD improved significantly for both classes \(e\.g\., Class 0: 0\.176 to 0\.026\), and Clustering MMD remained competitive\. The average edge counts in the refined graphs \(95\.0 for Class 0\) dropped compared to the WGAN output \(111\.2\), with the real data distribution for Class 0 being 103\.

### IV\-BComparison to State of the Art

TABLE IV:Comparison with state\-of\-the\-art methods using MMD metrics \(lower is better\) as presented in\[[5](https://arxiv.org/html/2605.29161#bib.bib5),[9](https://arxiv.org/html/2605.29161#bib.bib9),[10](https://arxiv.org/html/2605.29161#bib.bib10)\]Table[IV](https://arxiv.org/html/2605.29161#S4.T4)compares the results of our refinement process against DeepGMG, GraphRNN, LGGA, and the base WGAN\. The refined graphs outperform the baseline WGAN in Spectral MMD for both PROTEINS \(0\.01\) and ENZYMES \(0\.02\)\. In terms of Clustering MMD, the refined graphs achieve the best performance on PROTEINS \(0\.03\) and remains highly competitive on ENZYMES \(0\.04\), outperforming GraphRNN and LGGAN\. While the degree distribution MMD is slightly higher than the best baseline in some cases, the overall balance of metrics indicates that the hybrid WGAN\-GA approach generated graphs that are structurally more robust and representative of the target domain\.

### IV\-CVisual Analysis

![Refer to caption](https://arxiv.org/html/2605.29161v1/raw_graph_43.png)

![Refer to caption](https://arxiv.org/html/2605.29161v1/refined_graph_43.png)

![Refer to caption](https://arxiv.org/html/2605.29161v1/raw_graph_66.png)

![Refer to caption](https://arxiv.org/html/2605.29161v1/refined_graph_66.png)

Figure 3:Visual comparison on PROTEINS dataset \(left: generated and refined graphs from Class 0; right: Class 1\)\.![Refer to caption](https://arxiv.org/html/2605.29161v1/Class0_Triple_degree3.png)![Refer to caption](https://arxiv.org/html/2605.29161v1/Class0_Triple_clustering3.png)![Refer to caption](https://arxiv.org/html/2605.29161v1/Class0_Triple_spectral3.png)![Refer to caption](https://arxiv.org/html/2605.29161v1/Class1_Triple_degree3.png)![Refer to caption](https://arxiv.org/html/2605.29161v1/Class1_Triple_clustering3.png)![Refer to caption](https://arxiv.org/html/2605.29161v1/Class1_Triple_spectral3.png)Figure 4:Distribution comparison between real and generated graphs on the PROTEINS dataset\.Figure[3](https://arxiv.org/html/2605.29161#S4.F3)provides a qualitative comparison of the generated graphs\. These two graphs were chosen as representative as they are both close to the mean cardinality of other graphs of their class\. The refined graphs exhibit more coherent structures with fewer disconnected components compared to the raw WGAN outputs\. This visual inspection correlates with the quantitative improvements in Spectral MMD, as the GA successfully connects isolated nodes and breaks unrealistic hubs not found in the source data\. This connectivity also occurs while simultaneously decreasing the edge count in some cases to better match the data\.

Figure[4](https://arxiv.org/html/2605.29161#S4.F4)displays the histograms for degree, clustering coefficient, and spectral distributions respectively\. The refined distributions \(orange\) generally track the real data \(black lines\) more closely than the WGAN outputs \(blue\)\. Notably, the spectral distribution in Figure[4](https://arxiv.org/html/2605.29161#S4.F4)shows that the refinement process effectively shifts the eigenvalue spectrum to align much closer to the ground truth, correcting the skew in distribution seen in the coarser WGAN generation\.

## VConclusions and Future Work

In this work, we presented a framework for graph generation and refinement that combines global modelling of Wasserstein GANs with local optimization of GAs\. By using the state of the art WGAN outputs as high quality initial seed populations, our GA is able to effectively refine the graph structure, correcting deviations in spectral and clustering that the WGAN on its own could not capture\.

Our results on multiple benchmark datasets demonstrate that this two\-stage approach consistently improves structural fidelity of the graphs that were already optimized using a GAN in\[[10](https://arxiv.org/html/2605.29161#bib.bib10)\]\. Specifically, the evolutionary refinement significantly lowers MMD scores for spectral and clustering distributions, giving us graphs that are topologically more realistic\. Offloading this computationally heavy GA process to an optimized library ensures that the refinement is as efficient as possible\.

Future work could focus on expanding the ways in which the GA can be used in the graph generation, perhaps as an integral part of the generator process in graph based GANs\. Researchers could also use the graph generation capability of the GA to replace the edge prediction component of the GAN, without it being stacked at the end of the training cycle\.

## Acknowledgements

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada \(NSERC\), and used the facilities of the Shared Hierarchical Academic Research Computing Network \(SHARCNET\) and Compute Canada\.

## References

- \[1\]M\. Arjovsky, S\. Chintala, and L\. Bottou\.Wasserstein generative adversarial networks\.InProc\. Intl\. Conf\. on Machine Learning, pages 214–223, 2017\.
- \[2\]A\. Barry, J\. Griffith, and C\. O’Riordan\.An evolutionary and graph\-rewriting based approach to graph generation\.InIntl\. Joint Conf\. on Computational Intelligence \(IJCCI\), volume 1, pages 237–243, 2015\.
- \[3\]N\. De Cao and T\. Kipf\.MolGAN: An implicit generative model for small molecular graphs\.ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018\.
- \[4\]M\. Dubé, S\. Houghten, and D\. Ashlock\.Representation for evolution of epidemic models\.In2019 IEEE Congress on Evolutionary Computation \(CEC\), pages 2370–2377, Piscataway NJ, 06 2019\. IEEE Press\.
- \[5\]S\. Fan and B\. Huang\.Labeled graph generative adversarial networks\.arXiv preprint arXiv:1906\.03220, 2019\.
- \[6\]I\. Gulrajani et al\.Improved training of Wasserstein GANs\.InAdvances in Neural Information Processing Systems, 2017\.
- \[7\]Y\. Li et al\.Learning deep generative models of graphs\.arXiv preprint arXiv:1803\.03324, 2018\.
- \[8\]C\. Morris et al\.Tudataset: A collection of benchmark datasets for learning with graphs\.InICML 2020 Workshop on Graph Representation Learning and Beyond \(GRL\+ 2020\), 2020\.
- \[9\]S\.A\. Razi Razavi et al\.Generating labeled graphs using conditional Wasserstein GANs\.In35th International Conference on Collaborative Advances in Software and Computing \(CASCON\)\. IEEE Computer Society, 2025\.
- \[10\]S\.A\. Razi Razavi et al\.Adaptive edge learning for density\-aware graph generation\.arXiv preprint arXiv:2601\.23052, 2026\.
- \[11\]M\. Simonovsky and N\. Komodakis\.GraphVAE: Towards generation of small graphs using variational autoencoders\.InIntl\. Conf\. on Artificial Neural Networks, 2018\.
- \[12\]H\. Wang et al\.GraphGAN: Graph representation learning with generative adversarial nets\.InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018\.
- \[13\]P\. Wang et al\.Graph generative adversarial networks with evolutionary algorithm\.Applied Soft Computing, 164:111981, 2024\.
- \[14\]J\. You et al\.Graph convolutional policy network for goal\-directed molecular graph generation\.InAdvances in Neural Information Processing Systems, 2018\.

Similar Articles

Improving GANs using optimal transport

OpenAI Blog

OT-GAN introduces a novel GAN variant using optimal transport combined with energy distance in an adversarially learned feature space to improve training stability and image generation quality. The method demonstrates state-of-the-art results on benchmark problems with stable training using large mini-batches.

TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection

arXiv cs.CL

TERGAD is a novel data augmentation framework that uses large language models to translate node-level topological properties into semantic narratives, then fuses these with original node attributes via a gated dual-branch autoencoder for graph anomaly detection, achieving state-of-the-art results on six datasets.

Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models

arXiv cs.LG

Researchers propose a lightweight autoregressive framework for graph generation that uses structure-guided topological ordering to achieve near log-linear complexity, addressing scalability and novelty limitations of existing diffusion and autoregressive methods. The approach supports both LSTM and Mamba-style backbones and shows improved novelty while maintaining validity and uniqueness on molecular and non-molecular benchmarks.

Graph Alignment Topology as an Inductive Bias for Grounding Detection

arXiv cs.CL

This paper introduces Graph Alignment Topology as an inductive bias for grounding detection, using a graph neural network to model alignment structure between reference information and LLM outputs. The method achieves state-of-the-art results on multiple hallucination and question-answering datasets, outperforming GPT-4o.