Auto-Configured Explainable Graph Neural Networks for Multi-Site Pollution Prediction
Summary
This paper proposes a confusion matrix-based graph construction method and a hybrid loss function for Graph Neural Networks to improve multi-site pollution prediction accuracy and interpretability, evaluated on real-world air pollution data.
View Cached Full Text
Cached at: 06/25/26, 05:08 AM
# Auto-Configured Explainable Graph Neural Networks for Multi-Site Pollution Prediction
Source: [https://arxiv.org/html/2606.24978](https://arxiv.org/html/2606.24978)
Abdelkader Dairi Department of Computer Science University of Science and Technology of Oran\-Mohamed Boudiaf \(USTO\-MB\) El Mnaouar, BP 1505, Bir El Djir 31000, Oran, Algeria abdelkader\.dairi@univ\-usto\.dz&Fouzi Harrou Computer, Electrical and Mathematical Sciences and Engineering \(CEMSE\) Division King Abdullah University of Science and Technology \(KAUST\) Thuwal 23955\-6900, Saudi Arabia fouzi\.harrou@kaust\.edu\.sa&Ying Sun Computer, Electrical and Mathematical Sciences and Engineering \(CEMSE\) Division King Abdullah University of Science and Technology \(KAUST\) Thuwal 23955\-6900, Saudi Arabia ying\.sun@kaust\.edu\.sa
###### Abstract
Accurate particulate matter \(PM\) prediction is crucial for mitigating air pollution\. Graph Neural Networks \(GNNs\) effectively model spatiotemporal dependencies, but predefined graphs limit adaptability, and some datasets complicate learning\. This study introduces a graph construction method based on a confusion matrix from a supervised learning process to dynamically capture inter\-class relationships\. Additionally, a hybrid loss function that combines energy distance and Huber loss is applied to address the vanishing gradient problem and improve learning stability\. The approach is evaluated using air pollution data from the University of Utah AirU Pollution Monitoring Network in Salt Lake City, UT, with five GNN models: Graph Convolutional Networks \(GCNs\), Simple Graph Convolutional Networks \(SGConv\), Graph Isomorphism Networks \(GINs\), Graph Attention Networks \(GATs\), and GraphSage\. The experimental results of single\- and multistep predictions confirm that GraphSage achieves the highest accuracy in predicting the concentrations of PM1\{1\}, PM10\{10\}, and PM2\.5over different time horizons\. Furthermore,GNNExplainer \(Graph Neural Network Explainer\) and PGExplainer \(Probabilistic Graph Explainer\)are applied to interpret feature importance and graph structure, ensuring model transparency\. Results show improved prediction accuracy, with GNN models outperforming traditional machine learningand deep learning models \(i\.e\., Prophet, Long short\-term memory, Gated recurrent unitsin air pollution forecasting\.
## 1Introduction
Air pollution, particularly particulate matter \(PM\), poses significant threats to public health and the environment\[[11](https://arxiv.org/html/2606.24978#bib.bib35)\]\. PM, a complex mixture of tiny particles and liquid droplets, can penetrate deep into the respiratory system, causing various health problems such as asthma, bronchitis, heart attacks, and even premature death\[[25](https://arxiv.org/html/2606.24978#bib.bib37)\]\. Children, the elderly, and individuals with pre\-existing health conditions are particularly vulnerable populations at risk\[[34](https://arxiv.org/html/2606.24978#bib.bib31)\]\. Accurate prediction of PM concentrations is crucial for mitigating these health risks\. Predictive models enable timely interventions and inform policy decisions to improve air quality, protect public health, and enhance environmental sustainability\[[8](https://arxiv.org/html/2606.24978#bib.bib34)\]\. By forecasting PM levels, communities can better prepare to reduce exposure and take proactive measures, contributing to a healthier and safer living environment\[[6](https://arxiv.org/html/2606.24978#bib.bib36)\]\.
Traditional methods for predicting PM concentrations have relied heavily on time series models and shallow machine learning methods\. Time series models like seasonal decomposition of time series and autoregressive integrated moving average \(ARIMA\) are widely employed for their capability to capture temporal patterns in data\[[17](https://arxiv.org/html/2606.24978#bib.bib32),[10](https://arxiv.org/html/2606.24978#bib.bib33)\]\. However, these models often struggle with high\-dimensional, non\-linear relationships inherent in environmental data\. They are typically limited by their reliance on historical data alone, failing to incorporate complex interactions between multiple factors that influence PM levels\. Shallow machine learning methods, including linear regression, decision trees, and support vector machines \(SVMs\), have also been applied to PM prediction\[[31](https://arxiv.org/html/2606.24978#bib.bib29)\]\. These methods can capture linear and simple non\-linear relationships, offering some improvements over traditional time series models\. However, they generally fall short in handling the spatiotemporal dependencies and the intricate dynamics of air pollution\. Shallow machine learning models lack the depth required to extract meaningful features from large, heterogeneous datasets, often leading to suboptimal performance in predicting PM concentrations\.
Graph Neural Networks \(GNNs\) have recently demonstrated significant potential in capturing complex relationships within environmental data\[[35](https://arxiv.org/html/2606.24978#bib.bib14),[4](https://arxiv.org/html/2606.24978#bib.bib15)\]\. GNNs can effectively capture the spatial and temporal dependencies inherent in air pollution data, offering a more nuanced and accurate approach to PM prediction\. Various studies have demonstrated the superiority of GNN\-based models over traditional methods\. Recent advances in spatio\-temporal graph neural networks have reinforced their relevance in urban computing applications, including air pollution forecasting\[[13](https://arxiv.org/html/2606.24978#bib.bib4)\]\. These models leverage both spatial correlations among monitoring stations and temporal dependencies in air quality data to enhance predictive performance\. Furthermore, hybrid architectures integrating CNNs and adaptive GCNs have been proposed to address the limitations of purely distance\-based approaches by capturing both geographic and latent region\-wise dependencies\[[14](https://arxiv.org/html/2606.24978#bib.bib5)\]\. Additionally, automated spatio\-temporal synchronous modeling frameworks have emerged to improve dynamic predictions, as demonstrated by Li et al\.\[[18](https://arxiv.org/html/2606.24978#bib.bib6)\], where multiple graph structures are leveraged to refine message\-passing mechanisms for more robust traffic forecasting\. For instance, Qi et al\. introduced a hybrid model, GC\-LSTM, which integrates Graph Convolutional Networks \(GCN\) with Long Short\-Term Memory \(LSTM\) networks to forecast PM2\.5concentrations\[[26](https://arxiv.org/html/2606.24978#bib.bib23)\]\. The model outperformed state\-of\-the\-art methods by using spatiotemporal graph series from historical data and various air quality and meteorological factors as graph signals\. It achieved a high correlation coefficient \(R2= 0\.72\) for 72\-hour predictions, demonstrating its potential for future pollutant concentration forecasting\. In\[[15](https://arxiv.org/html/2606.24978#bib.bib24)\], Kim et al\. developed a novel framework for PM2\.5prediction using a multi\-gated graph neural network\. The model captured complex interactions between monitoring stations by employing multiple edges based on an atmospheric diffusion coefficient and PM2\.5similarity metric\. The model demonstrated significant improvements in root\-mean\-square error \(RMSE\) \(2\.613%\) and R2\(5\.263%\) for 96\-hour predictions compared to conventional time\-series models, highlighting its effectiveness in capturing global features and mitigating local dependency issues\. In\[[19](https://arxiv.org/html/2606.24978#bib.bib25)\], Lin et al\. introduced the ST\-CCN\-PM2\.5, a framework combining spatial attention mechanisms and causal convolution networks to enhance PM2\.5prediction accuracy\. The model outperformed several baseline models, showing a substantial decrease in RMSE \(27\.05%\), MAE \(10\.38%\), and an increase in R2\(3\.56%\) for single stations\. The study in\[[23](https://arxiv.org/html/2606.24978#bib.bib44)\]proposed the SA\-GNN model for predicting short\-term PM2\.5concentrations by treating monitoring stations as graph nodes and leveraging their spatial relationships\. The model incorporated meteorological variables and clustering\-based spatiotemporal feature extraction within a graph neural network framework\. Applied in Delhi, the model significantly improved R2\(0\.75\), RMSE \(25\.13μg/m3\\mu g/m^\{3\}\), and MAE \(21\.28μg/m3\\mu g/m^\{3\}\), especially during high pollution episodes, demonstrating its potential for similarly polluted cities\.
In\[[7](https://arxiv.org/html/2606.24978#bib.bib26)\], Ejurothu et al\. developed a Local Hybrid\-Graph Neural Network \(HGNN\) approach for monitoring station\-wise multi\-step PM2\.5forecasting across India\. The model integrated spatiotemporal units and station\-wise feature extraction units to handle local meteorological variations\. In\[[40](https://arxiv.org/html/2606.24978#bib.bib27)\], Zeng et al\. introduced the STGODE\-M model, employing tensor\-based ordinary differential equations to capture spatial\-temporal dynamics and build deeper networks\. The model included air humidity as an auxiliary feature and used wind direction data for adjacency matrix construction\. Evaluated on a dataset for home\-based care parks, the STGODE\-M showed superior performance in capturing spatial\-temporal characteristics of PM2\.5, providing better guidance for elderly travel and reducing health risks\. Liu et al\.\[[20](https://arxiv.org/html/2606.24978#bib.bib28)\]presented a new benchmark task for graph\-based machine learning, focusing on predicting future PM2\.5concentrations under distribution shift\. Their study revealed that GNN models suffer more from distribution shifts than non\-graph\-based models, emphasizing the need for special attention when deploying spatio\-temporal GNNs in practice\. In\[[28](https://arxiv.org/html/2606.24978#bib.bib42)\], Teng et al\. developed the GNN\_LSTM model, which captured spatiotemporal correlations among monitoring sites to improve long\-term PM2\.5forecasting\. The model showed significant performance improvements in the Beijing\-Tianjin\-Hebei region, particularly during polluted episodes\. The inclusion of AOD features further enhanced prediction accuracy, emphasizing the importance of neighborhood site features for long\-term air quality forecasting\. In another study, Zhang et al\.\[[41](https://arxiv.org/html/2606.24978#bib.bib43)\]proposed a novel method for long\-term PM2\.5prediction using a spatiotemporal graph attention recurrent neural network combined with a Grey Wolf optimization algorithm\. The model effectively integrated spatial and temporal dependencies, providing improved prediction accuracy and robustness against variations in PM2\.5concentrations\. In\[[24](https://arxiv.org/html/2606.24978#bib.bib39)\], Pei et al\. proposed the PMNet model, combining adaptive variational mode decomposition \(AVMD\) with a multivariate temporal graph neural network \(MtemGNN\)\. The model effectively extracted complex relationships between multivariate time series and demonstrated superior prediction performance compared to baseline models\. Ablation experiments highlighted the significant contributions of AVMD, GRU, and MtemGNN to reducing MAE and improving prediction accuracy\.
This work introduces a novel approach for predicting PM concentration across multiple monitoring sites\. The method leverages advanced graph analysis \(improved Graph Neural Network\) to capture the complex interplay between various factors \(multivariate\) that influence PM levels for several diameters, considering both their location \(spatial\) and how they change over time \(temporal\)\. This approach utilizes historical data \(time series\) from these sites to enhance prediction\. The fundamental elements of our work can be outlined as follows:
- •This study introduces a novel method for constructing graph structures using a confusion matrix derived from a supervised learning process\. Traditional methods often rely on predefined graph structures that can limit a model’s adaptability and performance due to their static nature\. In contrast, our approach dynamically reflects the data’s inherent relationships, capturing inter\-class dependencies to provide a more adaptive and accurate representation of spatial and temporal dependencies\. This dynamic graph construction adapts more effectively to changes and variations in the data, leading to improved prediction accuracy\. By capturing subtle but crucial inter\-class relationships, the model’s predictive capabilities are significantly enhanced\.
- •Furthermore, the proposed approach employs a hybrid loss function that combines energy distance and Huber loss to address the vanishing gradient problem, a major challenge in training deep neural networks\. By integrating these two loss functions, our method ensures more stable and efficient learning, resulting in better model performance\. This hybrid approach balances robustness and sensitivity, crucial for handling outliers and providing smooth gradients during training, thereby enhancing the model’s overall accuracy\.
- •Moreover, the proposed approach is extensively evaluated using a real\-world dataset and five different GNN models: GCN, GAT, GIN, SGConv, and GraphSage\. Additionally, a comparison with traditional machine learning models, including k\-Nearest Neighbors \(kNN\), Random Forest \(RF\), Extra Trees \(ET\), Decision Tree \(DT\), and Gradient Boosting \(GB\), has been conducted to assess the relative performance of GNNs in PM concentration prediction\.To further strengthen the evaluation, deep learning models such as Prophet, Long Short\-Term Memory \(LSTM\), and Gated Recurrent Unit \(GRU\) have also been included to benchmark performance under similar forecasting conditions\.This evaluation includes both single and multi\-step prediction experiments, thoroughly comparing model performance across various scenarios\. Such a comprehensive evaluation provides robust validation of the approach, demonstrating its effectiveness\.
- •Finally, we utilize two explainable AI techniques to interpret the model’s decision\-making process: GNNExplainer and PGExplainer\. These tools thoroughly analyze feature importance and graph structure, offering insights into how the model arrives at its predictions\. This enhances the transparency of the GNN models\.
The subsequent sections of this paper are structured as follows\. Section[2](https://arxiv.org/html/2606.24978#S2)briefly presents the basic concepts of the five GNN models considered in this study\. Section[3](https://arxiv.org/html/2606.24978#S3)outlines the main steps of the proposed GNN strategy\. Section[4](https://arxiv.org/html/2606.24978#S4)discusses the data used and presents the results of single and multi\-step PM pollution predictions\. Section[5](https://arxiv.org/html/2606.24978#S5)concludes the study and future lines of improvements\.
## 2Preliminary Materials
This section presents the materials used in the proposed approach, with a focus on GNNs and their variants\. The block diagram in Figure[1](https://arxiv.org/html/2606.24978#S2.F1)illustrates the proposed GNN\-based framework for predicting particulate matter \(PM\) concentrations across multiple monitoring stations\. The process begins with an air quality monitoring network, where raw PM data and environmental factors \(e\.g\., temperature, humidity\) are collected from weather stations distributed across an urban environment\. In the data preprocessing stage, the collected data undergoes cleaning, normalization, and imputation to ensure consistency and quality\. Next, a graph representation is constructed, where nodes represent monitoring stations and edges denote spatial or statistical relationships between them, determined using a confusion matrix\-based approach\. The constructed graph is then used to train multiple GNN models \(GCN, GAT, GIN, SGConv, and GraphSage\), which learn spatial\-temporal dependencies in PM concentrations\. Once trained, the PM concentration prediction module forecasts PM levels \(PM1, PM2\.5, PM10\) for all stations, using the graph structure to capture spatial correlations\. Finally, to ensure model transparency, GNNExplainer and PGExplainer are applied for explainability analysis, identifying key features and relationships that contribute to the model’s predictions\. This framework enables accurate, data\-driven forecasting while ensuring interpretability through explainability techniques\.
Figure 1:Block diagram of the proposed GNN\-Based PM prediction framework\.### 2\.1Graph Neural Networks
GNNs are a type of neural network specifically designed for processing graph\-structured data\. They excel at capturing the connections and dependencies among nodes within a graph, rendering them ideal for tasks including node classification, link prediction, and graph clustering\. They leverage the connections between nodes \(data points\) to capture spatial relationships within the data\. Unlike traditional neural networks, GNNs excel at processing graph\-structured data by exploiting node connections to learn complex relationships\. The basic concept of GNNs involves iteratively aggregating information from a node’s neighbors to update its representation, allowing the network to learn from the structural information of the graph\.
Five well\-established GNN models are considered in this study: Graph Convolutional Networks \(GCNs\), Simple Graph Convolutional Networks \(SGConv\), Graph Isomorphism Networks \(GINs\), Graph Attention Networks \(GATs\), and GraphSage\.
##### GCNs models
pioneered the field of GNNs with their use of spectral convolutions to aggregate information from neighboring nodes\[[16](https://arxiv.org/html/2606.24978#bib.bib22)\]\. GCNs are efficient and interpretable, but their reliance on spectral properties can limit their applicability to certain types of graphs\. The GCN operation is defined as follows:
𝐇\(𝐥\+𝟏\)=σ\(𝐃^−12𝐀^𝐃^−12𝐇\(𝐥\)𝐖\),\\mathbf\{H\(l\+1\)\}=\\sigma\\left\(\\hat\{\\mathbf\{D\}\}^\{\-\\frac\{1\}\{2\}\}\\hat\{\\mathbf\{A\}\}\\hat\{\\mathbf\{D\}\}^\{\-\\frac\{1\}\{2\}\}\\mathbf\{H\(l\)\}\\mathbf\{W\}\\right\),\(1\)where𝐀^\\hat\{\\mathbf\{A\}\}is the adjacency matrix with self\-connections added \(𝐀^=𝐀\+𝐈\\hat\{\\mathbf\{A\}\}=\\mathbf\{A\}\+\\mathbf\{I\}\),𝐃\\mathbf\{D\}is the degree matrix of𝐀^\\hat\{\\mathbf\{A\}\},𝐇\\mathbf\{H\}is the input feature matrix,𝐖\\mathbf\{W\}is the learnable weight matrix, andσ\\sigmais a non\-linear activation function \(e\.g\., ReLU\)\.
##### Simple Graph Convolutional Networks \(SGConv\)
In the SimpleGConv layers of GNNs, a node’s feature vector is enhanced by incorporating information from its neighbors\[[32](https://arxiv.org/html/2606.24978#bib.bib1)\]\. This is done by summing the feature vectors of neighboring nodes, each multiplied by a learned weight matrix\. Additionally, the node’s own feature vector undergoes a transformation using a separate weight matrix, and a bias vector is added for further customization\. An activation function is applied next to introduce non\-linearity, allowing the GNN to capture intricate relationships embedded within the graph structure\. By integrating information from both the node itself and its neighbors, GNNs can learn rich representations of nodes that reflect the broader context of the entire graph\. Simple GCNs do not involve message passing and aggregation in the usual GNN sense\. Instead, they use a simplified approach where node features are directly averaged with features from neighboring nodes in the convolution operation\. This averaging can be represented as:
Hi\(l\+1\)=σ\(∑j∈𝒩iW\(l\)Hj\(l\)\+WHi\(l\)\+b\(l\)\),H^\{\(l\+1\)\}\_\{i\}=\\sigma\\left\(\\sum\_\{j\\in\\mathcal\{N\}\_\{i\}\}W^\{\(l\)\}H\_\{j\}^\{\(l\)\}\+WH\_\{i\}^\{\(l\)\}\+b^\{\(l\)\}\\right\),\(2\)
whereHi\(l\)H^\{\(l\)\}\_\{i\}is the feature vector of nodeiiat layerll,W\(l\)W^\{\(l\)\}is the learnable weight matrix at layerll,σ\\sigmais a non\-linear activation function \(e\.g\., ReLU\),N\(i\)N\(i\)is the set of neighbor nodes of nodeii, and\|N\(i\)\|\|N\(i\)\|is the degree \(number of neighbors\) of nodeii\.
##### Graph Isomorphism Networks \(GINs\)
are powerful for node classification tasks due to their message\-passing framework and permutation equivariance\[[33](https://arxiv.org/html/2606.24978#bib.bib48)\]\. They excel in distinguishing graph structures, although they might be less interpretable compared to GCNs\. The GIN framework is represented by:
mv\(l\+1\)=φ\(σ\(W\(l\)AGG\(mu\(l\)∈M\(u\)\|u∈N\(v\),hv\(l\)\)\)\),m\_\{v\}^\{\(l\+1\)\}=\\varphi\\left\(\\sigma\\left\(W^\{\(l\)\}\\text\{AGG\}\\left\(\{m\_\{u\}^\{\(l\)\}\\in M\(u\)\|u\\in N\(v\)\},h\_\{v\}^\{\(l\)\}\\right\)\\right\)\\right\),\(3\)
whereN\(v\)N\(v\)is the set of neighbor nodes of nodevv,W\(l\)W^\{\(l\)\}is a learnable weight matrix at layerll,σ\\sigmais a non\-linear activation function \(e\.g\., ReLU\), andφ\\varphiis another learnable function that transforms the message after aggregation and activation\.
##### GATs
address the issue of uniform weighting in GCNs by introducing an attention mechanism, allowing the model to focus on informative neighbors\[[29](https://arxiv.org/html/2606.24978#bib.bib49)\]\. However, this can lead to higher computational costs\. The attention mechanism in GATs can be represented as:
avu\(l\)=σ\(a\(Wa\(l\)hv\(l\),Wa\(l\)hu\(l\)\)\),a\_\{vu\}^\{\(l\)\}=\\sigma\(a\(W\_\{a\}^\{\(l\)\}h\_\{v\}^\{\(l\)\},W\_\{a\}^\{\(l\)\}h\_\{u\}^\{\(l\)\}\)\),\(4\)whereavu\(l\)a\_\{vu\}^\{\(l\)\}is the attention score for neighboruuof nodevvat layerll, andaais an attention function that takes the features of nodevvand its neighboruuas input and outputs a raw attention score\.
##### GraphSage
explores inductive learning, making it suitable for large graphs and dynamic settings where unseen nodes might appear\[[3](https://arxiv.org/html/2606.24978#bib.bib50)\]\. Defining an effective aggregation function remains a key consideration\. The GraphSage aggregation process can be represented as:
mv\(l\+1\)=AGG\(mu\(l\)∈M\(u\)\|u∈Sv\(l\),hv\(l\)\),m\_\{v\}^\{\(l\+1\)\}=AGG\(\{m\_\{u\}^\{\(l\)\}\\in M\(u\)\|u\\in S\_\{v\}^\{\(l\)\}\},h\_\{v\}^\{\(l\)\}\),\(5\)
whereAGGAGGis a learnable function that combines messages and the node’s own feature vector,mv\(l\+1\)m\_\{v\}^\{\(l\+1\)\}is the aggregated message for nodevvat layer\(l\+1\)\(l\+1\),M\(u\)M\(u\)is the set of messages received by nodeuuat layerll, andhv\(l\)h\_\{v\}^\{\(l\)\}is the feature vector of nodevvat layerll\. Common aggregation functions include sum, mean, and user\-defined functions based on learnable neural networks\.
Overall, the five GNN models exhibit distinct strengths and trade\-offs in capturing spatial and temporal dependencies for PM concentration forecasting\. GCN effectively models local node relationships through spectral convolutions but may struggle with long\-range dependencies\. GAT improves upon this by incorporating an attention mechanism that assigns different importance weights to neighboring nodes, enhancing spatial feature extraction\. GIN, designed to match the Weisfeiler\-Lehman graph isomorphism test, provides strong graph representation learning but can be computationally demanding\. SGConv simplifies graph convolution by reducing redundant transformations and improving computational efficiency while still effectively capturing both spatial and temporal dependencies through smooth information propagation across layers\. This allows it to maintain structural coherence while preserving trends over time\. GraphSage, with its inductive learning approach, excels in handling dynamic graphs by leveraging neighborhood sampling, making it highly efficient for real\-world applications with evolving data\.
However, traditional GCNs rely on predefined graphs, which may not effectively capture dynamic relationships\. While GATs adaptively weigh node importance and GINs enhance feature aggregation, these methods still depend on static graph structures\. To address this limitation, an automatic graph construction approach based on a confusion matrix from supervised learning is introduced, enabling the model to better capture inter\-class relationships\. Additionally, a hybrid loss function is incorporated to enhance learning stability and mitigate vanishing gradient issues, ensuring more robust and accurate PM concentration forecasting\.
## 3The proposed Methodology
This section outlines the key steps of the proposed confusion matrix\-based explainable GNN approach for multi\-site pollution prediction, as illustrated in Figure \([2](https://arxiv.org/html/2606.24978#S3.F2)\)\. The methodology can be divided into several essential stages:
- •Graph Construction: Utilizing a confusion matrix derived from supervised learning, we compute the adjacency matrix\. This matrix captures the relationships between different pollution monitoring sites, forming the foundation of our graph structure\.
- •GNN Training and Optimization: We train the GNN using the constructed graph\. To address the vanishing gradient problem, we employ a hybrid loss function that combines the energy distance and Huber loss\. This optimization step ensures robust learning and model stability\.
- •Prediction: Once the GNN is trained, it is used for multi\-site pollution prediction\. The model leverages the spatial and temporal correlations captured during the graph construction and training phases to make accurate predictions of pollution levels across different sites\.
- •Explainability and Interpretation: To interpret and analyze the predictions, we apply a GNN explainer\. This tool helps to elucidate the model’s decision\-making process, providing insights into the factors influencing pollution levels and the interactions between different monitoring sites\.
Figure[2](https://arxiv.org/html/2606.24978#S3.F2)illustrates the overall workflow of our approach, detailing each step from graph construction to model explainability\.
Figure 2:Schematic representation of the main steps in the proposed GNN\-based approach\.### 3\.1Data Preprocessing
Data preprocessing is a critical initial step to ensure the reliability, consistency, and accuracy of the predictive modeling pipeline\. The process begins with data cleaning, which includes a thorough completeness analysis to remove inconsistent, noisy, or missing entries\. Only monitoring stations with sufficient and continuous observations are retained to ensure that the input to the models reflects reliable spatial and temporal information\. To handle missing values, the K\-Nearest Neighbors \(KNN\) Imputer is employed\. This imputation technique estimates missing data points by referencing the feature values of thekkmost similar \(nearest\) observations based on Euclidean distance\. This method preserves local data structure and correlation, making it suitable for environmental datasets with spatial dependencies\. After cleaning and imputation, the dataset undergoes feature scaling to ensure uniformity in input ranges\. All features are normalized using Min–Max normalization, which transforms the values of each feature into a common scale within the interval\[0,1\]\[0,1\]\. This step is crucial for improving the convergence behavior and stability of gradient\-based optimization during neural network training\. The Min–Max normalization is defined by:
xiscaled=xi−ximinximax−ximin,x\_\{i\}^\{\\text\{scaled\}\}=\\frac\{x\_\{i\}\-x\_\{i\}^\{\\min\}\}\{x\_\{i\}^\{\\max\}\-x\_\{i\}^\{\\min\}\},\(6\)
wherexix\_\{i\}represents the original value of the feature, andximinx\_\{i\}^\{\\min\}andximaxx\_\{i\}^\{\\max\}are the minimum and maximum values of that feature in the training data, respectively\. By applying this normalization technique, all variables contribute equally to the learning process, preventing features with larger numerical ranges from dominating model behavior\. This standardization is particularly important when using models such as neural networks and GNNs, which are sensitive to scale differences across input features\.
### 3\.2Graph Structure
This study tackles the challenge of predicting PM concentration across multiple sites by employing an innovative graph\-based method to analyze multivariate time series data\. The process begins with constructing a graph to represent the relationships between monitoring stations\. Traditionally, graphs are built using distance\-based approaches, utilizing the GPS locations of monitoring stations\. However, due to the absence of distance information in the dataset, an alternative method is necessary to establish connections within the graph\.
To address this limitation, a novel approach leveraging supervised learning is introduced for creating the adjacency matrix\. Specifically, a classification model predicts the device ID for each data point in the dataset using various features \(pollutants\) to distinguish between monitoring stations\. This classification task produces a confusion matrix, where frequently confused classes are identified as candidates for graph connections\. Essentially, classes with high misclassification rates are likely to be linked in the graph\.
#### 3\.2\.1Confusion matrix calculation
The confusion matrix𝐂∈ℝN×N\\mathbf\{C\}\\in\\mathbb\{R\}^\{N\\times N\}is calculated based on a supervised classification task, where each data point is assigned a predicted classy^\\hat\{y\}and compared to the true classyy\. The matrix records how often classiiis misclassified as classjj, the diagonal elements representing the correct classifications\. Mathematically, the elements of the confusion matrix are defined as:
Ci,j=∑k=1M𝟙\(yk=i∧y^k=j\)C\_\{i,j\}=\\sum\_\{k=1\}^\{M\}\\mathds\{1\}\(y\_\{k\}=i\\land\\hat\{y\}\_\{k\}=j\)\(7\)
WhereCi,jC\_\{i,j\}represents the number of instances where classiiwas classified as classjj,MMis the total number of data samples,𝟙\(⋅\)\\mathds\{1\}\(\\cdot\)is the indicator function that returns 1 if the condition inside holds true and 0 otherwise, andyky\_\{k\}andy^k\\hat\{y\}\_\{k\}are the true and predicted labels of thekk\-th data point\. The confusion matrix quantifies the misclassification patterns within the dataset, allowing the identification of inter\-class relationships that inform graph construction\.
#### 3\.2\.2Constructing the Adjacency Matrix from the Confusion Matrix
To transform the confusion matrix into an adjacency matrix𝐀\\mathbf\{A\}, a thresholdτ\\tauis applied to determine significant connections\. A connection \(edge\) is established between two nodes if their confusion value exceeds a predefined empirical threshold:
Ai,j=\{1,ifi≠jandCi,j\>τ,0,otherwise\.A\_\{i,j\}=\\begin\{cases\}1,&\\text\{if \}i\\neq j\\text\{ and \}C\_\{i,j\}\>\\tau,\\\\ 0,&\\text\{otherwise\}\.\\end\{cases\}\(8\)
whereAi,j=1A\_\{i,j\}=1indicates the presence of an edge between nodeiiand nodejj, andτ\\tauis a tunable parameter empirically determined to maintain connectivity while minimizing excessive noise\.
This threshold ensures that only meaningful relationships based on frequent misclassification errors are used to form the graph structure while maintaining connectivity among all monitoring stations\. As a result, the adjacency matrix serves as a blueprint to build a comprehensive and cohesive graph, encoding the connections between nodes \(monitoring stations\)\.
### 3\.3Normalization of the Adjacency Matrix
To stabilize the graph learning process, the adjacency matrix is normalized:
𝐀~=𝐃−12𝐀𝐃−12\.\\tilde\{\\mathbf\{A\}\}=\\mathbf\{D\}^\{\-\\frac\{1\}\{2\}\}\\mathbf\{A\}\\mathbf\{D\}^\{\-\\frac\{1\}\{2\}\}\.\(9\)
where𝐃\\mathbf\{D\}is the degree matrix defined asDii=∑jAijD\_\{ii\}=\\sum\_\{j\}A\_\{ij\}, and𝐀~\\tilde\{\\mathbf\{A\}\}is the normalized adjacency matrix used in the GNN calculations\. These normalized versions, represented by𝐀~\\tilde\{\\mathbf\{A\}\}, improve numerical stability and facilitate better information propagation across the graph, improving the efficiency of GNN training\.
Figure[3](https://arxiv.org/html/2606.24978#S3.F3)illustrates the process of constructing a graph from a confusion matrix derived from a supervised classification task\. The confusion matrix \(left\) represents the misclassification frequencies among four classes \(Class 1 to Class 4\)\. Each entryCi,jC\_\{i,j\}in this matrix indicates the number of times samples from Classiiwere misclassified as Classjj\. Higher misclassification values suggest stronger similarity or dependency between the corresponding classes\.
To construct the graph \(right\), a thresholding operation is applied to filter significant misclassification frequencies, ensuring that only meaningful relationships contribute to graph formation\. Nodes in the graph represent different classes, and edges between them reflect the misclassification relationships\. The weight of an edge corresponds to the frequency of misclassification between two classes\. Stronger connections \(solid edges\) represent higher misclassification frequencies, while weaker connections \(dashed edges\) indicate lower but still relevant relationships\.
50421345522342312445Confusion MatrixClass 1Class 1Class 2Class 2Class 3Class 3Class 4Class 4Thresholding1234425322Node 1Node 2Node 3Node 4Figure 3:Illustrating the transformation from a confusion matrix \(left\) into a graph structure \(right\)\. Solid edges indicate strong relationships based on misclassifications, while dashed edges represent weaker yet relevant connections\.This study focuses on establishing a robust graph structure based on the confusion matrix\-derived adjacency matrix and does not consider edge features that could capture specific details about the relationships\. The primary aim is to represent the interactions and dependencies among the monitoring stations effectively, facilitating accurate PM concentration predictions\.
### 3\.4Graph\-based regression
Graph\-based regression utilizes the structure and features of a graph to predict continuous values for nodes or the entire graph\[[27](https://arxiv.org/html/2606.24978#bib.bib11),[36](https://arxiv.org/html/2606.24978#bib.bib12),[12](https://arxiv.org/html/2606.24978#bib.bib13)\]\. Unlike graph classification, which assigns discrete labels to nodes or graphs, graph regression predicts continuous outcomes\[[42](https://arxiv.org/html/2606.24978#bib.bib9),[37](https://arxiv.org/html/2606.24978#bib.bib10)\]\. This method is akin to traditional multi\-step forecasting, which considers interwoven relationships and influences between multiple variables over time\.
This study addresses the challenge of predicting PM concentration levels \(PM1\{1\}, PM2\.5\{2\.5\}, and PM10\{10\}\) collected from several air quality monitoring stations\. The approach utilizes six key features: temperature, humidity, MicsRED, MicsNOX, MicsHeater, and historical data on the target pollutant\. This enables the Graph Neural Network \(GNN\) to capture how air pollution travels and influences surrounding areas\. By iteratively exchanging information across the network \(message passing\), each station’s representation is enriched, accounting for the influence of its upwind and downwind neighbors\. A traditional regression layer then uses this enriched representation to predict PM concentrations \(PM2\.5\{2\.5\}, PM1\{1\}and PM10\{10\}\) at each station\. This approach allows GNNs to outperform traditional methods by considering the crucial spatial relationships between monitoring stations in PM forecasting\.
To address challenges such as the vanishing or exploding gradient problem due to potentially long paths of information travel through the graph, a batched\-graph learning approach is employed for model training, leveraging the graph structure\. The performance of five different GNN models—SimpleGCN, GCN, GIN, GAT, and GraphSage—is evaluated\. During training, the vanishing gradient problem emerged as a recurring challenge, impacting model convergence\. To mitigate this, a hybrid loss functionℒ\(y,y^\)\\mathcal\{L\}\(y,\\hat\{y\}\)is designed, combining Huber lossℋ\\mathcal\{H\}and Energy distanceℰ\\mathcal\{E\}:
ℋδ\(y,y^\)=\{12\(y−y^\)2for\|y−y^\|≤δδ\|y−y^\|−12δ2for\|y−y^\|\>δ\\mathcal\{H\}\_\{\\delta\}\(y,\\hat\{y\}\)=\\left\\\{\\begin\{array\}\[\]\{ll\}\\frac\{1\}\{2\}\(y\-\\hat\{y\}\)^\{2\}&\\text\{for\}\|y\-\\hat\{y\}\|\\leq\\delta\\\\ \\delta\|y\-\\hat\{y\}\|\-\\frac\{1\}\{2\}\\delta^\{2\}&\\text\{for\}\|y\-\\hat\{y\}\|\>\\delta\\end\{array\}\\right\.\(10\)
ℰ\(u,v\)=\(2∫−∞∞\(Fu\(x\)−Fv\(x\)\)p\)1/p\\mathcal\{E\}\(u,v\)=\\Bigg\(2\\int\_\{\-\\infty\}^\{\\infty\}\\left\(F\_\{u\}\(x\)\-F\_\{v\}\(x\)\\right\)^\{p\}\\Bigg\)^\{1/p\}\(11\)ℒ\(y,y^\)=ℰ\(y,y^\)∗ℋδ\(y,y^\)\\mathcal\{L\}\(y,\\hat\{y\}\)=\\mathcal\{E\}\(y,\\hat\{y\}\)\*\\mathcal\{H\}\_\{\\delta\}\(y,\\hat\{y\}\)\(12\)
Here,ℋ\\mathcal\{H\}represents the Huber loss,yydenotes the true value,y^\\hat\{y\}the predicted value, andδ\\deltais a parameter controlling the transition between quadratic and linear parts of the loss function\.ℰ\\mathcal\{E\}is the energy distance between two distributions, withppinfluencing the sensitivity to outliers\.
Message passing is a fundamental concept in GNNs, enabling nodes to learn from the graph’s structure by aggregating information from neighboring nodes\. Through multiple message\-passing steps across layers, GNNs progressively build richer representations of each node, taking into account both the node’s features and the context provided by its connected neighbors\. This approach can be framed as a time series regression problem, where the model learns from past observations in the station’s data to predict future values\. The model effectively captures temporal and spatial dependencies by processing data at the node level and incorporating other relevant features\.
The prediction performance of the five GNN architectures was compared on real\-world datasets to assess the effectiveness of the PM spatiotemporal forecasting approach\. The effectiveness of each prediction model was evaluated using statistical measures: RMSE, Mean Absolute Error \(MAE\), andR2R^\{2\}\.
### 3\.5Graph regression Explainability
While GNNs excel at graph regression tasks, their "black\-box" nature can be problematic due to their complex message passing and non\-linear activation functions, which make it difficult to directly interpret their reasoning\. Despite their strong performance and wide applicability, GNN models lack transparency, making it challenging to understand the reasoning behind their outputs\.
Instance\-level explanations are the dominant technique used in explainability\[[39](https://arxiv.org/html/2606.24978#bib.bib16),[21](https://arxiv.org/html/2606.24978#bib.bib17),[1](https://arxiv.org/html/2606.24978#bib.bib18)\]analysis to understand how models arrive at specific predictions\. These methods provide input\-specific explanations tailored to each graph, identifying the key input features that drive the model’s predictions\. By focusing on input features, these methods generate input\-dependent explanations that pinpoint the features most impactful on the model’s output for each graph\.
This study utilizes the GNNExplainer\[[38](https://arxiv.org/html/2606.24978#bib.bib19)\], a post\-hoc explanation method that focuses on individual instances \(instance\-based\) and falls under the category of perturbation\-based techniques\. GNNExplainer helps elucidate the inner workings of GNNs by identifying the most relevant input features and connections that contribute to the model’s predictions, thereby enhancing the interpretability of graph regression models\. A Parameterized Explainer for GNNs\[[22](https://arxiv.org/html/2606.24978#bib.bib20)\]is a learnable model designed to interpret GNN predictions by identifying key parts of the input graph that influence the model’s decisions\. It involves training an additional neural network that takes the original graph and the trained GNN as inputs, outputting importance scores for nodes and edges\. These scores highlight the most influential substructures within the graph\. The training process ensures the explainer aligns with the GNN’s predictions, balancing sparsity and fidelity\. This method enhances the transparency and trustworthiness of GNNs, making them more interpretable and suitable for critical applications\.
## 4Results and discussion
### 4\.1Data description
The used dataset in this study contains air pollution measurements from 25 monitors in Salt Lake City, Utah, collected by the AirU Pollution Monitoring Network from January 1, 2019, to May 19, 2021\[[2](https://arxiv.org/html/2606.24978#bib.bib2)\]\. Each pollution monitor transmits data packets every minute and is equipped with a suite of environmental sensors: a Plantower PMS3003 for counting airborne particles, a Texas Instruments HDC1080 for measuring temperature and humidity, and an SGX SensorTech MiCS4514 for detecting oxidizing and reducing gases\. The dataset includes basic weather data, such as temperature and humidity, alongside readings from specialized sensors \(MicsRED and MicsNOX\) capable of detecting various gases, including CO, H2S, Ethanol, Ammonia, Hydrogen, Methane, and Propane\. There is also a column indicating the state of a heater for the oxidizing sensor \(MicsHeater\)\. Most importantly, the dataset contains measurements of particulate matter \(PM\) in different sizes \(PM1, PM2\.5, and PM10\)\. Each pollution monitor is identified by a unique Device ID\.
### 4\.2Experiments and Settings
To construct the graph, the adjacency matrices must first be computed\. This is achieved by applying an XGBoost classifier\[[5](https://arxiv.org/html/2606.24978#bib.bib3)\]to predict the device ID for each data point in the dataset, using various features \(pollutants\) to distinguish between monitoring stations\.
For supervised learning with the XGBoost classifier, 80% of the data was used for training and validation, while the remaining 20% was reserved for testing\. The model was configured with 500 trees, a learning rate of 0\.1, 13 classes, and a maximum depth of 4\. The "multi:softprob" objective was selected to output a probability distribution across classes, ensuring accurate classification of data points to their respective device IDs\. This mapping is essential for constructing a robust adjacency matrix for the GNN\. The classification task produces a confusion matrix \(Figure[4](https://arxiv.org/html/2606.24978#S4.F4)\), where frequently misclassified classes indicate strong inter\-class relationships\. These relationships inform graph construction, ensuring that nodes \(monitoring stations\) are linked based on data\-driven spatial correlations, rather than arbitrary predefined connections\.
Figure 4:Confusion Matrix Result from Supervised Learning Using an XGBoost Classifier, Where Data Points in the Dataset Are Mapped to Device ID the target classes\.The inclusion of data from multiple locations plays a crucial role in building the GNN by enabling the model to effectively learn spatial dependencies across diverse environments\. As shown in Figure[4](https://arxiv.org/html/2606.24978#S4.F4), the confusion matrix generated from the XGBoost classifier guides the graph construction process, ensuring that the resulting graph structure is data\-driven rather than predefined\. This approach captures meaningful relationships between different monitoring stations, allowing the model to dynamically reflect real\-world spatial correlations\.
Next, adjacency matrices are constructed using thresholded confusion matrices \(refer to Equation[8](https://arxiv.org/html/2606.24978#S3.E8)\)\. This approach takes advantage of the inherent relationships within the confusion matrix to encode the graph structure\. The threshold ensures that all monitoring stations in the network remain connected, providing a comprehensive and cohesive graph structure \(Figure[5](https://arxiv.org/html/2606.24978#S4.F5)\)\. The resulting adjacency matrix serves as a blueprint for constructing the graph, encoding the connections between nodes \(monitoring stations\)\. The empirical thresholdτ\\tauis selected by gradually increasing the value from the minimum non\-zero entry of the confusion matrix until the resulting graphGGremains fully connected\. This ensures that all classes are represented in the graph without excessive noise from low\-frequency confusion\. The value ofτ\\tauis thus the smallest threshold that maintains graph connectivity\. This data\-driven selection process avoids arbitrary cutoff values and adapts to the structure of the supervised confusion matrix\. In cases where no single threshold maintains connectivity, the fallback is a minimal spanning structure based on the strongest connections\. This strategy balances interpretability, sparsity, and robustness in graph construction\.
Figure 5:The graph constructed based on a thresholded confusion matrix\.τ=\{min\{i∈\{0,…,N−1\}∣connected\(G\)\}if∃iwhereconnected\(G\),\-1otherwise\.\\tau=\\begin\{cases\}\\min\\\{i\\in\\\{0,\\dots,N\-1\\\}\\mid\\text\{connected\}\(G\)\\\}&\\text\{if \}\\exists i\\text\{ where \}\\text\{connected\}\(G\),\\\\ \\text\{\-1\}&\\text\{otherwise\.\}\\end\{cases\}\(13\)
After constructing the GNN graph based on the computed confusion matrix, the prediction performance of five GNN models \(i\.e\., SimpleGCN, GCN, GIN, GAT, and GraphSage\) in forecasting PM levels was assessed\. Two series of experiments were conducted for this evaluation\. The first series focused on one\-hour\-ahead multivariate forecasting using the GNN models\. The second series evaluated univariate forecasting of PM concentrations for short to medium\-term horizons, specifically 3, 6, 9, and 12 hours ahead\. For all GNN models employed in this study, the dataset was split into training and testing sets\. In particular, 85% of the data was dedicated to training and validation, while the remaining 15% was allocated for testing\. The training period spanned from January 1, 2019, to January 9, 2021, and the testing period ran from January 10, 2021, to May 19, 2021\. This division ensured a robust evaluation of the models’ predictive performance over different time horizons and spatiotemporal configurations\.
The GNNs investigated in this study were implemented using the Deep Graph Library \(DGL\)\[[30](https://arxiv.org/html/2606.24978#bib.bib7)\], a Python framework designed for efficient and flexible development of graph\-based models\. Based on PyTorch, DGL provides fine\-grained control over message passing operations and supports performance optimizations such as auto\-batching and sparse matrix kernels\. These features allow for scalable and efficient training across multiple CPU and GPU environments\. DGL’s modular design facilitated the implementation of all five GNN architectures considered in this work, including GCN, GAT, GIN, SGConv, and GraphSage\. In addition, the NetworkX library\[[9](https://arxiv.org/html/2606.24978#bib.bib8)\]was used alongside DGL to construct and verify graph connectivity, leveraging its utilities for graph structure analysis and preprocessing\. The hyperparameters for the GNN models adopted in the study were determined through a grid search approach\. Grid search is a foundational technique for hyperparameter optimization in deep learning, facilitating a structured exploration of the hyperparameter space and leading to the identification of well\-performing parameter configurations\. The hyperparameters identified through this method are detailed in Table[1](https://arxiv.org/html/2606.24978#S4.T1)\. Each model configuration includes layers specific to the GNN type followed by a fully connected layer, and common hyperparameters for optimization and training\.
Table 1:Hyperparameters used for the GNN models in the study\.During training, hyperparameters were selected to ensure stable loss function convergence, preventing overfitting and optimizing predictive accuracy\. The tuning process involved iterative adjustments until the model demonstrated consistent performance across multiple training runs\. The primary criterion was the smooth and rapid convergence of the loss function, indicating effective learning\. For illustration, Figure[6](https://arxiv.org/html/2606.24978#S4.F6)\(a\-c\) presents the training loss curves for the GraphSage model across different pollutant concentration predictions\. The curves show that the model achieved stable convergence within a few epochs, confirming the effectiveness of the selected hyperparameters in preventing issues such as exploding or vanishing gradients\.



Figure 6:Loss function curves for GSage training across different pollutant concentration predictions\. \(a\) Training loss for PM1prediction, \(b\) Training loss for PM2\.5prediction, and \(c\) Training loss for PM10prediction\. The loss function exhibits smooth convergence, ensuring stable learning for all pollutant concentrations\.This study compares the performance of GNN\-based models with traditional machine learning \(ML\) approaches for PM pollutant concentration forecasting\. To ensure a rigorous evaluation, five widely used ML regression models, including k\-Nearest Neighbors \(kNN\), Random Forest \(RF\), Extra Trees \(ET\), Decision Tree \(DT\), and Gradient Boosting \(GB\), were considered alongside the investigated GNN models\. These ML models were selected based on their effectiveness in capturing nonlinear trends in time series forecasting applications\. The hyperparameters for the ML models \(Table[2](https://arxiv.org/html/2606.24978#S4.T2)\) were optimized to enhance predictive accuracy\. Aligning training and evaluation strategies ensures a comprehensive assessment of their effectiveness in capturing air pollution patterns\. Additionally, to provide a broader benchmark, deep learning \(DL\) models, Prophet, Long Short\-Term Memory \(LSTM\), and Gated Recurrent Unit \(GRU\), were also included\. These models are widely used for univariate and multivariate time series forecasting due to their ability to capture temporal dependencies\. Their inclusion allows for a more complete comparison against both GNN and traditional ML models\. For the deep learning baselines, the LSTM and GRU models were implemented with two hidden layers, each containing 64 units\. The ReLU activation function was used, with a batch size of 64 and the mean squared error \(MSE\) as the loss function\. The Adam optimizer was employed to train the models efficiently\. For the Prophet model, automatic configuration was used for yearly, weekly, and daily seasonality, with a changepoint prior scale set to 0\.01 to ensure conservative trend estimation\. The seasonality mode was set to multiplicative, making the model suitable for short\-term pollutant concentration forecasting\. All experiments in this study were conducted on a personal laptop equipped with an Intel Core i7 8th Generation CPU, 16 GB of RAM, and an NVIDIA GeForce GTX 1050 GPU with 4 GB of VRAM\. This configuration was sufficient to train and evaluate all machine learning, deep learning, and graph\-based models used in this research without requiring access to high\-performance computing infrastructure\.
Table 2:Hyperparameter configurations for the machine learning models used in PM concentration forecasting\.
### 4\.3Prediction results
This section evaluates the predictive performance of GNN, ML, and DL models \(Prophet, LSTM, GRU\) for PM concentration forecasting\. The assessment covers both single and multi\-step predictions, analyzing how well each model captures temporal and spatial dependencies in air pollution data\. It is structured into two main parts: the first examines the models’ accuracy in forecasting pollutant concentrations for the next immediate time step, while the second extends the evaluation to longer forecasting horizons \(3, 6, 9, and 12 hours ahead\)\. The comparative analysis provides a comprehensive evaluation of each approach’s effectiveness and limitations in modeling air pollution dynamics\.
#### 4\.3\.1Single\-Step PM prediction using GNN models
The trained GNN models were evaluated on the testing dataset to predict PM1\{1\}, PM2\.5\{2\.5\}, and PM10concentrations\. Table[3](https://arxiv.org/html/2606.24978#S4.T3)presents the results of this hourly\-based forecasting, comparing the performance of different GNN models using three statistical metrics:R2R^\{2\}, MAE, and RMSE\.
The results in Table[3](https://arxiv.org/html/2606.24978#S4.T3)demonstrate the predictive capabilities of five GNN models: GAT, GCN, SGConv, GSage, and GIN in forecasting PM1\{1\}, PM2\.5\{2\.5\}, and PM10concentrations\. GSage consistently outperformed other models across all pollutants, achieving the highestR2R^\{2\}values and the lowest MAE and RMSE values\.
For PM1\{1\}predictions, GSage recorded anR2R^\{2\}of 0\.9980, along with the lowest MAE \(0\.1641\) and RMSE \(0\.2120\), indicating superior accuracy\. SGConv followed with anR2R^\{2\}of 0\.9901, while GIN, GAT, and GCN showed relatively lower performance\. Similarly, for PM10\{10\}predictions, GSage again achieved the highest accuracy with anR2R^\{2\}of 0\.9970, MAE of 0\.3346, and RMSE of 0\.4527\. SGConv and GCN performed well, while GIN exhibited the highest errors\. For PM2\.5, GSage maintained its top performance with anR2R^\{2\}of 0\.9973, MAE of 0\.3065, and RMSE of 0\.3780\. SGConv and GAT produced competitive results, but GIN had the highest MAE and RMSE, indicating a lower predictive capacity\.
Table 3:Performance comparison of GNN models for pollutant concentration prediction \(PM1, PM2\.5, and PM10\)\.Table[4](https://arxiv.org/html/2606.24978#S4.T4)presents the predictive performance of classical machine learning \(ML\), time\-series \(Prophet\), and deep learning \(DL\) models \(LSTM and GRU\) across PM1\{1\}, PM2\.5\{2\.5\}, and PM10\{10\}\. Ensemble\-based ML models \(GB, ET, RF\) consistently outperformed simpler models like DT and kNN across all pollutants, with GB and ET achieving the highest R2values \(up to 0\.892 for PM2\.5\{2\.5\}and PM10\{10\}, and 0\.886 for PM1\{1\}\), demonstrating strong generalization capabilities\. Prophet, while designed for time\-series forecasting, showed lower accuracy compared to the top\-performing ML models, particularly for PM2\.5\{2\.5\}\(R2= 0\.669\), indicating its limited suitability for this multivariate setup\. Similarly, deep learning models such as LSTM and GRU underperformed relative to GB and ET, with LSTM reaching R2values of 0\.680 for PM2\.5\{2\.5\}and 0\.689 for PM1, while GRU exhibited slightly lower performance\. These results suggest that although DL models are effective in capturing temporal dynamics, they may require larger datasets or architectural tuning to match ensemble ML accuracy in this context\. Compared to the GNN\-based results in Table[3](https://arxiv.org/html/2606.24978#S4.T3), which achieved R2above 0\.997 across all pollutants, both ML and DL approaches lagged behind\. This underscores the advantage of GNNs in modeling spatial dependencies among monitoring stations, an aspect that standard ML and DL models do not explicitly leverage\.
“‘latex
Table 4:Performance of ML and DL models for predicting PM1, PM10, and PM2\.5\.“‘
Figure[7](https://arxiv.org/html/2606.24978#S4.F7)provides a comparative evaluation of the averaged prediction performance across GNN models, traditional ML models, and DL models \(Prophet, LSTM, GRU\) for PM concentration prediction\.
Figure 7:Averaged prediction performance \(R2and MAE\) comparison among GNN models, traditional ML models, and DL models for PM concentration prediction\.Results indicate that GNN models significantly outperform both ML and DL models\. GSage demonstrates the highest predictive accuracy with anR2R^\{2\}of 0\.9974 and the lowest MAE of 0\.2684\. SGConv and GAT also show strong performance, withR2R^\{2\}scores exceeding 0\.98\. In comparison, traditional ML models such as kNN and DT yield notably lowerR2R^\{2\}values \(0\.4370\) and higher MAEs \(2\.459 and 2\.222, respectively\)\. Among DL models, LSTM achieves better performance than GRU and Prophet, with an averageR2R^\{2\}of 0\.6807 and MAE of 2\.277\. However, these results remain inferior to those of GNN\-based models, demonstrating the effectiveness of graph\-based learning in modeling spatial and temporal dependencies in air pollution prediction\.
The superior performance of GNN models over traditional machine learning and deep learning methods can be attributed to their ability to explicitly model spatial dependencies among air quality monitoring stations\. While conventional ML models such as kNN and decision trees treat each observation independently, failing to account for spatial interactions, and DL models like LSTM and GRU primarily focus on capturing temporal patterns, GNNs leverage graph structures to integrate both spatial and contextual relationships\. Specifically, GSage aggregates information from neighboring nodes through adaptive message passing, enabling it to learn richer and more context\-aware spatial representations\. In contrast, models like kNN, DT, and Prophet operate in isolation and lack mechanisms to incorporate spatially structured information, resulting in lower prediction accuracy\. This spatial\-awareness, enabled through automatically constructed graphs derived from confusion matrices, empowers GNNs to model the complex interplay between monitoring stations and leads to their consistently superior predictive performance\.
#### 4\.3\.2Multi\-Step Prediction of PM Levels Using GNNs
The performance of the investigated GNN models for the multi\-step prediction of PM concentrations \(PM1\{1\}, PM2\.5\{2\.5\}, and PM10\) was assessed\. The GNN\-based forecasting approach in this study encompasses multiple horizons, ranging from 3 to 12 hours ahead, with predictions made on an hourly basis\. To capture trends at different time granularities, data is aggregated into hourly bins of varying lengths: 3, 6, 9, and 12 hours\. Table[5](https://arxiv.org/html/2606.24978#S4.T5)presents the results based on the testing data, evaluating each GNN model \(SimpleGCN, GCN, GIN, GAT, and GraphSage\) for short\- to medium\-term PM concentration forecasting \(3, 6, 9, and 12 hours ahead\)\.
Table 5:Performance of GNN models in predicting pollutant concentrations \(PM1, PM2\.5, and PM10\) for 3\-hour, 6\-hour, 9\-hour, and 12\-hour ahead predictions\.The results in Table[5](https://arxiv.org/html/2606.24978#S4.T5)show that all GNN models perform well in predicting PM concentrations, with theR2R^\{2\}scores indicating high predictive accuracy across different time horizons\. For short\-term predictions \(3 and 6 hours ahead\), the GSage model consistently achieves the highestR2R^\{2\}scores across all pollutants, demonstrating superior performance\. In medium\-term predictions \(9 and 12 hours ahead\), the GSage model also shows strong performance, but other models like GAT and GCN exhibit competitive results\. Across both experimental series, GraphSage emerged as the top performer for PM prediction\. GraphSage excels due to its efficiency, flexibility, and generalizability\. Unlike GNNs limited to static graphs, GraphSage supports inductive learning, allowing predictions on new nodes and graphs, ideal for dynamic datasets\. Its neighborhood sampling technique reduces computational complexity and memory requirements, making it efficient for massive graphs\. GraphSage’s customizable message\-passing scheme allows for defining specific aggregation functions suitable for various tasks such as link prediction, graph clustering, and anomaly detection\. It also provides interpretability through learnable aggregation functions, offering insights into how the model uses information from neighboring nodes\. GraphSage’s efficient message\-passing and sampling techniques also result in faster training times\. In contrast, GATs focus on the relative importance of neighboring nodes, providing a powerful approach for tasks where this is critical\. While offering interpretability through attention mechanisms, GATs require more computational resources than GraphSage\.
To comprehensively assess model performance, this section also includes a comparative analysis between GNN\-based, DL\-based, and traditional ML\-based approaches for multi\-step PM forecasting\. By evaluating all methodologies under the same predictive settings, we aim to highlight their respective strengths and limitations in handling the spatial and temporal complexities of air pollution data\. TableLABEL:tabshows the performance of ML and DL models in multi\-step PM forecasting across different prediction horizons \(3, 6, 9, and 12 hours\)\. The DL models include Prophet, LSTM, and GRU, which are widely used for temporal sequence modeling and provide a useful benchmark for evaluating the added value of graph\-based learning\.
“‘latex
Table 6:Performance of ML and DL models in predicting PM1, PM2\.5, and PM10concentrations for 3\-hour, 6\-hour, 9\-hour, and 12\-hour forecasts\.“‘
The results in Table[6](https://arxiv.org/html/2606.24978#S4.T6)reveal that ML models generally exhibit lowerR2R^\{2\}values across all pollutants and prediction horizons compared to GNNs\. For short\-term predictions \(3\-hour and 6\-hour ahead\), the best\-performing ML models \(RF, ET, and GB\) achieveR2R^\{2\}values ranging from 0\.705 to 0\.892, which is notably lower than GNN models such as GSage and GAT, which exceed 0\.99\. DT and kNN consistently underperform, particularly for PM1\{1\}and PM2\.5\{2\.5\}, withR2R^\{2\}values below 0\.7, indicating a weaker ability to capture spatiotemporal dependencies\. Incorporating DL models, Prophet, LSTM, and GRU, into the evaluation provides further insight into their predictive capabilities\. While Prophet achieves moderate short\-term accuracy \(e\.g\.,R2R^\{2\}around 0\.79–0\.80\), its performance declines over longer horizons\. LSTM and GRU models show stronger results than traditional ML methods in some cases, particularly at the 3\-hour mark \(withR2R^\{2\}above 0\.80\), but their accuracy degrades significantly as the prediction window extends, withR2R^\{2\}dropping below 0\.65 in most 12\-hour scenarios\. For longer prediction horizons \(9\-hour and 12\-hour ahead\), the gap between ML and DL models and GNN models further widens\. GNNs maintain strong predictive accuracy, while ML and DL models show a more significant decline inR2R^\{2\}, particularly for PM10\. For instance, the best non\-GNN models \(RF and GB\) achieve anR2R^\{2\}of 0\.819–0\.825, and LSTM and GRU fall below 0\.65, whereas GraphSage maintains values above 0\.97, showing superior generalization over extended forecasting windows\.
Figure[8](https://arxiv.org/html/2606.24978#S4.F8)presents a comparative analysis of the predictive performance of the considered models using theR2R^\{2\}metric\. The results demonstrate that GNN\-based models consistently outperform traditional ML models in predicting particulate matter concentrations\. Among the GNNs, GSage achieves the highestR2R^\{2\}score \(0\.9814\), indicating its superior ability to capture spatial dependencies within the air pollution dataset\. The strong performance of GSage can be attributed to its inductive learning capability and neighborhood sampling approach, which enhances generalization and computational efficiency\. Other GNN models, including GAT \(0\.9690\), GIN \(0\.9600\), SGConv \(0\.9598\), and GCN \(0\.9512\), also exhibit high predictive accuracy, reinforcing the effectiveness of graph\-based learning in air quality forecasting\. These models effectively capture complex spatial correlations among monitoring stations, leading to more robust predictions\. Conversely, traditional ML models show lowerR2R^\{2\}scores, with kNN \(0\.5742\) and DT \(0\.6358\) performing the worst\. The relatively poor performance of these models highlights their limitations in handling spatial dependencies and complex relationships within the dataset\. Among ML approaches, ET \(0\.8110\), RF \(0\.8063\), and GB \(0\.8137\) achieve moderateR2R^\{2\}scores, benefiting from ensemble learning techniques that improve stability and predictive power\. In addition, deep learning models were included to provide a broader benchmark\. Prophet achieved an averageR2R^\{2\}score of 0\.7131, followed by GRU \(0\.6761\) and LSTM \(0\.6828\)\. Although these models capture temporal dependencies effectively, their inability to model spatial interactions limits their performance compared to GNNs\.
Figure 8:Comparison of the averaged R2scores for different models in predicting PM concentrations\. Green bars represent GNN models, blue bars denote traditional machine learning models, and orange bars correspond to deep learning \(DL\) models\. Higher R2values indicate better predictive performance\.Figure[9](https://arxiv.org/html/2606.24978#S4.F9)\(a\-c\) presents a comparison between the predicted and observed pollutant concentrations for PM1\{1\}, PM2\.5\{2\.5\}, and PM10, respectively, using the best\-performing model, GraphSage\. The predicted values closely follow the observed trends, demonstrating the model’s ability to capture temporal variations and fluctuations in pollutant concentrations\. The consistency between the two curves indicates that GraphSage effectively learns spatial and temporal dependencies, leading to accurate short\-term forecasts\. These results highlight the effectiveness of GNN\-based approaches in air pollution forecasting\.



Figure 9:Comparison of predicted and observed pollutant concentrations using the GraphSage model for \(a\) PM1\{1\}, \(b\) PM2\.5\{2\.5\}, and \(c\) PM10\. The predictions closely follow the observed trends, demonstrating the model’s ability to capture temporal variations in pollutant concentrations\.To evaluate computational efficiency for real\-time forecasting, we measured the average execution time required for making predictions with each trained GNN model\. Since training is performed offline, the reported values reflect only the online inference time per sample\. The results show that the proposed models are computationally lightweight\. Specifically, GraphSage achieved the fastest execution time of 0\.0017 seconds, followed by GIN \(0\.0018 s\), SGConv \(0\.0053 s\), and GCN \(0\.0075 s\)\. GAT, due to its attention mechanism, recorded a slightly higher execution time of 0\.0234 seconds\. These fast inference times are partially attributed to the relatively small graph size used in this study, with only 13 monitoring stations, and to the fact that predictions are made at an hourly resolution\. The proposed approach is not designed for real\-time streaming but for short\-term hourly prediction, which further justifies its low computational requirements\. Overall, the GNN models—especially GraphSage and GIN—offer a strong balance between accuracy and efficiency, making them suitable for near\-real\-time air quality forecasting applications\.
### 4\.4Explainable GNN
This section presents the findings related to the explainability of the experimental results\. Two GNN explanation methods were employed: the Graph Neural Network Explainer \(GNNExplainer\) for analyzing feature importance\[[38](https://arxiv.org/html/2606.24978#bib.bib19)\]and the Parameterized Explainer for GNNs \(PGExplainer\) for analyzing the importance of graph structure\[[22](https://arxiv.org/html/2606.24978#bib.bib20)\], specifically the edges\. The GNNExplainer method was used to identify the most influential features for predicting PM1concentration\. As illustrated in Figure[10](https://arxiv.org/html/2606.24978#S4.F10), the lagged PM1feature and humidity emerged as the most influential features of the GraphSage model\. During the graph regression task, these features were consistently important across all monitoring stations \(nodes\)\. This insight highlights the significance of temporal dependencies and environmental factors in the model’s predictive performance\.
Figure 10:GSage features importance per device using GNNExplainerPGExplainer, on the other hand, was utilized to analyze the importance of edges within the graph structure\. By examining the connections between nodes, PGExplainer provided insights into how the relationships between different monitoring stations influence the model’s predictions\. Understanding edge importance helps in interpreting the model’s reliance on specific inter\-device relationships, further elucidating the underlying dynamics captured by the GNN\.
Figure[11](https://arxiv.org/html/2606.24978#S4.F11)visualizes the importance of edges in our graph using a heatmap derived from the PGExplainer explanation\. Darker regions represent edges that significantly influence the model’s predictions\. The first row depicts the influence of edges in one direction, while the second row shows the influence in the opposite direction\. This visualization clearly demonstrates the directional significance of certain edges in the graph, providing a deeper understanding of how the structure and connectivity within the graph contribute to the model’s predictive performance\.
Figure 11:Explaining Edges importance using PGExplainerThese explainability methods enhance the transparency of GNN models by shedding light on the critical features and structural elements that drive their predictions\. This improved understanding can inform model refinement and deployment strategies, ensuring more reliable and interpretable forecasting outcomes\.
Figure[12](https://arxiv.org/html/2606.24978#S4.F12)presents subgraphs extracted by GNNExplainer, emphasizing the importance of neighboring nodes for monitoring stations 2 and 6\. In Figure[5](https://arxiv.org/html/2606.24978#S4.F5), which shows the original graph, there is a bidirectional connection between monitoring stations 6 and 3\. However, the GNNExplainer analysis reveals that device 3 relied on a reduced set of neighboring nodes \(see Figure[12](https://arxiv.org/html/2606.24978#S4.F12)\(right\)\) compared to the original graph for the device 6 regression task\. This reduced set was still relevant for the prediction, according to the explainer, indicating that not all original connections were necessary for accurate forecasting\.
A similar behavior is observed between monitoring stations 2 \(Figure[12](https://arxiv.org/html/2606.24978#S4.F12), left panel\) and 7\. While Figure[5](https://arxiv.org/html/2606.24978#S4.F5)shows their connection in the original graph, the GNNExplainer analysis \(Figure[12](https://arxiv.org/html/2606.24978#S4.F12), left panel\) reveals that device 7 relied on a more compact set of neighboring nodes for the regression task\. This suggests that a full connection wasn’t necessary for accurate prediction, and the model effectively identified the most critical relationships within the graph\.


Figure 12:Example of two sub\-graphs used to highlight the neighboring importance of nodes 2 and 6, respectivelyThese findings underscore the effectiveness of GNNExplainer in highlighting the essential nodes and connections that contribute most significantly to the model’s predictions\. By focusing on the most influential neighbors, the explainer provides insights into the simplified yet impactful structure used by the GNN for its regression tasks, enhancing our understanding of the model’s decision\-making process\.
## 5Conclusion
This paper presents an advanced approach to predict particulate matter \(PM\) concentrations across multiple monitoring sites using GNNs\. The proposed method integrates automatic graph construction, hybrid loss functions, and explainability techniques to enhance both accuracy and interpretability in PM forecasting\. An efficient graph construction method using a confusion matrix from supervised learning enables the model to automatically construct graphs and enhance predictive performance\. Additionally, a hybrid loss function combining energy distance and Huber loss mitigates vanishing gradient issues, ensuring stable and efficient learning\. Extensive experiments were conducted using real\-world datasets to validate the effectiveness of the approach, comparing five GNN architectures: GCN, GAT, GIN, SGConv, and GraphSage\. Both single\-step and multi\-step forecasting results demonstrated that GraphSage achieved the highest accuracy, followed by GAT and SGConv\. To ensure transparency, GNNExplainer and PGExplainer were utilized to analyze feature importance and graph structures\. Beyond evaluating GNN models, a comparative analysis with traditional ML models, including kNN, RF, ET, DT, and GB, was performed\. The results highlighted the superior performance of GNN\-based approaches, particularly in capturing spatial dependencies and adapting to dynamic patterns in air pollution data\. While some ML models, such as Gradient Boosting and Extra Trees, showed competitive performance in short\-term predictions, they struggled to maintain accuracy over extended forecasting horizons\.In addition, this study included a comparison with three deep learning models: Prophet, LSTM, and GRU\. While these models demonstrated reasonable accuracy in short\-term forecasting scenarios, they exhibited higher error rates and reducedR2R^\{2\}scores compared to GNNs, particularly for longer forecasting horizons\.
While this study focused on data from a single urban area \(Salt Lake City\), extending the proposed framework to other geographical regions with different pollution dynamics represents an important direction for future research\. The data\-driven nature of the approach, particularly the automatic graph construction based on confusion matrices, enables it to adapt to diverse spatial and temporal patterns without relying on predefined structures\. Future work will evaluate the framework across multiple cities and pollution contexts to assess its generalizability and scalability\. Furthermore, although this study focuses on spatial and short\-to\-medium\-term temporal dependencies using historical PM measurements, future work can enhance the proposed model by incorporating additional environmental variables such as wind, rainfall, temperature, and urban morphology\. These factors can be encoded as node or edge attributes to better capture the complex dynamics of pollution diffusion\. Additionally, extending the temporal scope of the analysis to cover seasonal and yearly trends would provide deeper insights into long\-term air quality patterns and improve the robustness of the forecasting framework\.
## References
- \[1\]\(2023\)Evaluating explainability for graph neural networks\.Scientific Data10\(1\),pp\. 144\.Cited by:[§3\.5](https://arxiv.org/html/2606.24978#S3.SS5.p2.1)\.
- \[2\]T\. Becnel, K\. Kelly, and P\. Gaillardon\(2022\)University of utah airu pollution monitoring network \- salt lake city ut \- 2019\-07\-26 to 2021\-05\-14\.IEEE Dataport\.Note:Dataset available at IEEE DataportExternal Links:[Document](https://dx.doi.org/10.21227/aeh2-a413),[Link](https://dx.doi.org/10.21227/aeh2-a413)Cited by:[§4\.1](https://arxiv.org/html/2606.24978#S4.SS1.p1.3)\.
- \[3\]U\. A\. Bhatti, H\. Tang, G\. Wu, S\. Marjan, and A\. Hussain\(2023\)Deep learning with graph convolutional networks: an overview and latest applications in computational intelligence\.2023\(1\),pp\. 8342104\.Cited by:[§2\.1](https://arxiv.org/html/2606.24978#S2.SS1.SSS0.Px5.p1.1)\.
- \[4\]S\. Bloemheuvel, J\. van den Hoogen, D\. Jozinović, A\. Michelini, and M\. Atzmueller\(2023\)Graph neural networks for multivariate time series regression with application to seismic data\.International Journal of Data Science and Analytics16\(3\),pp\. 317–332\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[5\]T\. Chen and C\. Guestrin\(2016\)Xgboost: a scalable tree boosting system\.InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining,pp\. 785–794\.Cited by:[§4\.2](https://arxiv.org/html/2606.24978#S4.SS2.p1.1)\.
- \[6\]A\. Dairi, F\. Harrou, S\. Khadraoui, and Y\. Sun\(2021\)Integrated multiple directed attention\-based deep learning for improved air pollution forecasting\.70,pp\. 1–15\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p1.1)\.
- \[7\]P\. S\. S\. Ejurothu, P\. S\. S\. Ejurothu, S\. Mandal, S\. Mandal, M\. Thakur, and M\. Thakur\(2022\)Forecasting pm2\.5 concentration in india using a cluster based hybrid graph neural network approach\.External Links:[Document](https://dx.doi.org/10.1007/s13143-022-00291-4)Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p4.6)\.
- \[8\]K\. Gu, Z\. Xia, and J\. Qiao\(2019\)Stacked selective ensemble for pm 2\.5 forecast\.69\(3\),pp\. 660–671\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p1.1)\.
- \[9\]A\. Hagberg, P\. J\. Swart, and D\. A\. Schult\(2008\)Exploring network structure, dynamics, and function using networkx\.Technical reportLos Alamos National Laboratory \(LANL\), Los Alamos, NM \(United States\)\.Cited by:[§4\.2](https://arxiv.org/html/2606.24978#S4.SS2.p7.1)\.
- \[10\]F\. Harrou, L\. Fillatre, M\. Bobbia, and I\. Nikiforov\(2013\)Statistical detection of abnormal ozone measurements based on constrained generalized likelihood ratio test\.In52nd IEEE Conference on Decision and Control,pp\. 4997–5002\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p2.1)\.
- \[11\]W\. Hernandez, A\. Mendez, R\. Zalakeviciute, and A\. M\. Diaz\-Marquez\(2020\)Analysis of the information obtained from pm 2\.5 concentration measurements in an urban park\.69\(9\),pp\. 6296–6311\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p1.1)\.
- \[12\]W\. Huang, C\. Chen, C\. Lee, F\. Kuo, and S\. Huang\(2023\)Attentive gated graph sequence neural network\-based time\-series information fusion for financial trading\.Information Fusion91,pp\. 261–276\.Cited by:[§3\.4](https://arxiv.org/html/2606.24978#S3.SS4.p1.1)\.
- \[13\]G\. Jin, Y\. Liang, Y\. Fang, Z\. Shao, J\. Huang, J\. Zhang, and Y\. Zheng\(2023\)Spatio\-temporal graph neural networks for predictive learning in urban computing: a survey\.IEEE Transactions on Knowledge and Data Engineering36\(10\),pp\. 5388–5408\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[14\]G\. Jin, C\. Liu, Z\. Xi, H\. Sha, Y\. Liu, and J\. Huang\(2022\)Adaptive dual\-view wavenet for urban spatial–temporal event prediction\.Information Sciences588,pp\. 315–330\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[15\]D\. Kim, D\. Jin, and H\. Suk\(2023\)Spatiotemporal graph neural networks for predicting mid\-to\-long\-term pm2\.5 concentrations\.External Links:[Document](https://dx.doi.org/10.1016/j.jclepro.2023.138880)Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[16\]T\. N\. Kipf and M\. Welling\(2017\)Semi\-supervised classification with graph convolutional networks\.InInternational Conference on Learning Representations,pp\. 1–14\.Cited by:[§2\.1](https://arxiv.org/html/2606.24978#S2.SS1.SSS0.Px1.p1.8)\.
- \[17\]M\. H\. Lee, N\. H\. Rahman, M\. T\. Latif, M\. E\. Nor, and N\. A\. Kamisan\(2012\)Seasonal arima for forecasting air pollution index: a case study\.9\(4\),pp\. 570\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p2.1)\.
- \[18\]F\. Li, H\. Yan, G\. Jin, Y\. Liu, Y\. Li, and D\. Jin\(2022\)Automated spatio\-temporal synchronous modeling with multiple graphs for traffic prediction\.InProceedings of the 31st ACM international conference on information & knowledge management,pp\. 1084–1093\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[19\]S\. Lin, S\. Lin, J\. Zhao, J\. Zhao, J\. Li, J\. Li, X\. Liu, X\. Liu, X\. Liu, Y\. Zhang, Y\. Zhang, S\. Wang, S\. Wang, Q\. Mei, Q\. Mei, Z\. Chen, Z\. Chen, Y\. Gao, and Y\. Gao\(2022\)A spatial–temporal causal convolution network framework for accurate and fine\-grained pm2\.5 concentration prediction\.External Links:[Document](https://dx.doi.org/10.3390/e24081125)Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[20\]Y\. Liu, J\. Ma, P\. Dhillon, and Q\. Mei\(2021\)A new benchmark of graph learning for pm 2\.5 forecasting under distribution shift\.InACM,pp\. 6\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p4.6)\.
- \[21\]A\. Longa, S\. Azzolin, G\. Santin, G\. Cencetti, P\. Liò, B\. Lepri, and A\. Passerini\(2022\)Explaining the explainers in graph neural networks: a comparative study\.arXiv preprint arXiv:2210\.15304\.Cited by:[§3\.5](https://arxiv.org/html/2606.24978#S3.SS5.p2.1)\.
- \[22\]D\. Luo, W\. Cheng, D\. Xu, W\. Yu, B\. Zong, H\. Chen, and X\. Zhang\(2020\)Parameterized explainer for graph neural network\.Advances in neural information processing systems33,pp\. 19620–19631\.Cited by:[§3\.5](https://arxiv.org/html/2606.24978#S3.SS5.p3.1),[§4\.4](https://arxiv.org/html/2606.24978#S4.SS4.p1.2)\.
- \[23\]S\. Mandal and M\. Thakur\(2023\)Corrigendum to “a city\-based pm2\.5 forecasting framework using spatially attentive cluster\-based graph neural network model” \[j\. clean\. prod\. 405 \(2023\) 137036\]\.External Links:[Document](https://dx.doi.org/10.1016/j.jclepro.2023.137905)Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[24\]Y\. Pei, Y\. Pei, C\. Huang, C\. Huang, Y\. Shen, Y\. Shen, Y\. Ma, and Y\. Ma\(2022\)An ensemble model with adaptive variational mode decomposition and multivariate temporal graph neural network for pm2\.5 concentration forecasting\.External Links:[Document](https://dx.doi.org/10.3390/su142013191)Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p4.6)\.
- \[25\]G\. Polezer, Y\. S\. Tadano, H\. V\. Siqueira, A\. F\. Godoi, C\. I\. Yamamoto, P\. A\. de André, T\. Pauliquevis, M\. de Fatima Andrade, A\. Oliveira, P\. H\. Saldiva,et al\.\(2018\)Assessing the impact of pm2\. 5 on respiratory disease using artificial neural networks\.235,pp\. 394–403\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p1.1)\.
- \[26\]Y\. Qi, Y\. Qi, Q\. Li, Q\. Li, H\. Karimian, H\. Karimian, D\. Liu, and D\. Liu\(2019\)A hybrid model for spatiotemporal forecasting of pm2\.5 based on graph convolutional neural network and long short\-term memory\.\.Science of The Total EnvironmentJournal of Cleaner ProductionEntropyAsia\-pacific Journal of Atmospheric SciencesFrontiers in Environmental ScienceConcurrency and Computation: Practice and ExperienceConcurrency and Computation: Practice and ExperienceApplied EnergyAmerican Journal of Applied SciencesIEEE Transactions on Instrumentation and MeasurementIEEE Transactions on Instrumentation and MeasurementIEEE Transactions on Instrumentation and MeasurementEnvironmental pollutionarXiv\.orgSustainabilityarXiv: LearningarXiv: LearningEnvironment InternationalJournal of Environmental Chemical EngineeringJournal of Cleaner ProductionWireless Communications and Mobile ComputingarXiv: Signal ProcessingarXiv preprint arXiv:1810\.00826arXiv preprint arXiv:1710\.10903International Journal of Intelligent Systems\.External Links:[Document](https://dx.doi.org/10.1016/j.scitotenv.2019.01.333)Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[27\]L\. Ruiz, F\. Gama, and A\. Ribeiro\(2020\)Gated graph recurrent neural networks\.IEEE Transactions on Signal Processing68,pp\. 6303–6318\.Cited by:[§3\.4](https://arxiv.org/html/2606.24978#S3.SS4.p1.1)\.
- \[28\]M\. Teng, S\. Li, J\. Xing, C\. Fan, J\. Yang, S\. Wang, G\. Song, Y\. Ding, J\. Dong, and S\. Wang\(2023\)72\-hour real\-time forecasting of ambient pm2\. 5 by hybrid graph deep neural network with aggregated neighborhood spatiotemporal information\.176,pp\. 107971\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p4.6)\.
- \[29\]P\. Veličković, G\. Cucurull, A\. Casanova, A\. Romero, P\. Lio, and Y\. Bengio\(2017\)Graph attention networks\.Cited by:[§2\.1](https://arxiv.org/html/2606.24978#S2.SS1.SSS0.Px4.p1.8)\.
- \[30\]M\. Y\. Wang\(2019\)Deep graph library: towards efficient and scalable deep learning on graphs\.InICLR workshop on representation learning on graphs and manifolds,pp\. 1–18\.Cited by:[§4\.2](https://arxiv.org/html/2606.24978#S4.SS2.p7.1)\.
- \[31\]A\. Wu, F\. Harrou, A\. Dairi, and Y\. Sun\(2022\)Machine learning and deep learning\-driven methods for predicting ambient particulate matters levels: a case study\.34\(19\),pp\. e7035\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p2.1)\.
- \[32\]F\. Wu, A\. Souza, T\. Zhang, C\. Fifty, T\. Yu, and K\. Weinberger\(2019\)Simplifying graph convolutional networks\.InInternational conference on machine learning,pp\. 6861–6871\.Cited by:[§2\.1](https://arxiv.org/html/2606.24978#S2.SS1.SSS0.Px2.p1.1)\.
- \[33\]K\. Xu, W\. Hu, J\. Leskovec, and S\. Jegelka\(2018\)How powerful are graph neural networks?\.Cited by:[§2\.1](https://arxiv.org/html/2606.24978#S2.SS1.SSS0.Px3.p1.1)\.
- \[34\]S\. Yang, D\. Fang, and B\. Chen\(2019\)Human health impact and economic effect for pm2\. 5 exposure in typical cities\.249,pp\. 316–325\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p1.1)\.
- \[35\]X\. Yang, Y\. Zheng, Y\. Zhang, D\. S\. Wong, and W\. Yang\(2022\)Bearing remaining useful life prediction based on regression shapalet and graph neural network\.IEEE Transactions on Instrumentation and Measurement71,pp\. 1–12\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p3.12)\.
- \[36\]S\. Yao, H\. Zhang, C\. Wang, D\. Zeng, and M\. Ye\(2024\)GSTGAT: gated spatiotemporal graph attention network for traffic demand forecasting\.IET Intelligent Transport Systems18\(2\),pp\. 258–268\.Cited by:[§3\.4](https://arxiv.org/html/2606.24978#S3.SS4.p1.1)\.
- \[37\]Y\. Yin, M\. Liu, Q\. Zhu, S\. Zhang, N\. A\. Hussien, and Y\. Fan\(2023\)Multi\-branch attention graph convolutional networks for 3d human pose estimation\.IEEE Transactions on Instrumentation and Measurement\.Cited by:[§3\.4](https://arxiv.org/html/2606.24978#S3.SS4.p1.1)\.
- \[38\]Z\. Ying, D\. Bourgeois, J\. You, M\. Zitnik, and J\. Leskovec\(2019\)Gnnexplainer: generating explanations for graph neural networks\.Advances in neural information processing systems32\.Cited by:[§3\.5](https://arxiv.org/html/2606.24978#S3.SS5.p3.1),[§4\.4](https://arxiv.org/html/2606.24978#S4.SS4.p1.2)\.
- \[39\]H\. Yuan, H\. Yu, S\. Gui, and S\. Ji\(2022\)Explainability in graph neural networks: a taxonomic survey\.IEEE transactions on pattern analysis and machine intelligence45\(5\),pp\. 5782–5799\.Cited by:[§3\.5](https://arxiv.org/html/2606.24978#S3.SS5.p2.1)\.
- \[40\]Q\. Zeng, Q\. Zeng, C\. Wang, C\. Wang, G\. Chen, G\. Chen, H\. Duan, H\. Duan, S\. Wang, and S\. Wang\(2022\)For the aged: a novel pm2\.5 concentration forecasting method based on spatial\-temporal graph ordinary differential equation networks in home\-based care parks\.External Links:[Document](https://dx.doi.org/10.3389/fenvs.2022.956020)Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p4.6)\.
- \[41\]C\. Zhang, S\. Wang, Y\. Wu, X\. Zhu, and W\. Shen\(2024\)A long\-term prediction method for pm2\.5 concentration based on spatiotemporal graph attention recurrent neural network and grey wolf optimization algorithm\.12\(1\),pp\. 111716\.Cited by:[§1](https://arxiv.org/html/2606.24978#S1.p4.6)\.
- \[42\]T\. Zhang, C\. Liu, Z\. Liu, J\. Tan, and M\. Ahmat\(2023\)Temporal double graph convolutional network for co and co 2 prediction in blast furnace gas\.IEEE Transactions on Instrumentation and Measurement\.Cited by:[§3\.4](https://arxiv.org/html/2606.24978#S3.SS4.p1.1)\.Similar Articles
Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting
Proposes GC-MoE, a graph-conditioned mixture of experts framework for traffic forecasting that assigns each node a personalized combination of frozen pretrained spatio-temporal GNN experts based on graph topology and recent input, training only a lightweight routing module (∼17K parameters) and achieving competitive performance on four benchmarks.
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
This paper introduces NATD-GSSL, a framework evaluating the robustness of Graph Self-Supervised Learning on noisy, text-driven biomedical graphs. It demonstrates that certain GNN architectures and pretext tasks maintain performance despite real-world noise, offering practical guidance for unsupervised learning in imperfect datasets.
Hierarchical Multi-Scale Graph Neural Networks: Scalable Heterophilous Learning with Oversmoothing and Oversquashing Mitigation
This paper introduces HMH, a hierarchical multi-scale Graph Neural Network framework designed to address oversmoothing and oversquashing in heterophilous graphs. It utilizes spectral filters with Haar bases to achieve scalable learning and improved performance on node and graph classification tasks.
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks
This paper introduces a unified benchmark to evaluate the robustness of Graph Neural Networks on noisy, text-derived knowledge graphs and the effectiveness of graph construction methods in the biomedical domain.
Reconstructing GRACE Terrestrial Water Storage with Spatio-Temporal Graph Neural Networks: An Application to South America
This paper presents a deep learning approach using a spatio-temporal graph neural network (MTGNN) to reconstruct GRACE terrestrial water storage anomalies back to 1940 for South America, achieving high accuracy and outperforming previous methods with fewer predictors.