GPU-Accelerated Deep Learning for Heatwave Prediction and Urban Heat Risk Assessment

arXiv cs.LG Papers

Summary

This paper presents a GPU-accelerated deep learning framework using ConvLSTM models to predict next-day urban thermal conditions and generate heat risk maps, achieving strong performance (R²=0.8877) on Sarajevo data.

arXiv:2605.16435v1 Announce Type: new Abstract: Heatwaves are an important problem in cities, and climate change makes this problem more difficult. In this paper, we present a GPU-based deep learning framework for next-day prediction of urban thermal conditions and for heat risk assessment. The study was carried out in Sarajevo by using MODIS land surface temperature data and Open-Meteo forecast data. We tested several models, including convolutional models and spatiotemporal models. Among them, ConvLSTM with a mixed loss function gave the best results. The obtained values were MAE = 0.2293, RMSE = 0.3089, and R2 = 0.8877. The experiments also showed that results can be improved by using longer temporal series and additional meteorological variables. Since the framework was implemented on a GPU and trained with mixed precision, the execution time was reduced. Based on the predicted temperature fields, it was also possible to combine hazard information with exposure and vulnerability data in order to generate city heat risk maps. The proposed framework can be used as a practical basis for city heat analysis.
Original Article
View Cached Full Text

Cached at: 05/19/26, 06:43 AM

# GPU-Accelerated Deep Learning for Heatwave Prediction and Urban Heat Risk Assessment
Source: [https://arxiv.org/html/2605.16435](https://arxiv.org/html/2605.16435)
\[1\]\\fnmAdis\\surAlihodzic \(ORCID: 0000\-0003\-0761\-1667\)

\[1\]\\orgdivDepartment of Mathematical and Computer Sciences,\\orgnameFaculty of Science, University of Sarajevo,\\orgaddress\\citySarajevo,\\countryBosnia and Herzegovina

###### Abstract

Heatwaves are an important problem in cities, and climate change makes this problem more difficult\. In this paper, we present a GPU\-based deep learning framework for next\-day prediction of urban thermal conditions and for heat risk assessment\. The study was carried out in Sarajevo by using MODIS land surface temperature data and Open\-Meteo forecast data\. We tested several models, including convolutional models and spatiotemporal models\. Among them, ConvLSTM with a mixed loss function gave the best results\. The obtained values were MAE = 0\.2293, RMSE = 0\.3089, andR2=0\.8877R^\{2\}=0\.8877\. The experiments also showed that results can be improved by using longer temporal series and additional meteorological variables\. Since the framework was implemented on a GPU and trained with mixed precision, the execution time was reduced\. Based on the predicted temperature fields, it was also possible to combine hazard information with exposure and vulnerability data in order to generate city heat risk maps\. The proposed framework can be used as a practical basis for city heat analysis\.

###### keywords:

heatwave prediction, urban heat risk, deep learning, GPU acceleration, spatiotemporal models, ConvLSTM, climate data analysis

## 1Introduction

Heatwaves are the most effective climate extreme today\. It is well known that they have become more frequent, more severe, and longer in many regions\. Their effects can be seen in public health, energy demand, labor productivity, and urban infrastructure\. Recent studies have revealed that heat\-related mortality is strongly associated with extreme hot periods, especially in cities, where the urban environment can further increase thermal stress\[CHEVAL2024100603,CUERDOVILCHES2023164412,Huang2023\]\. Also, urban areas are particularly sensitive to extreme heat, as dense construction, impervious materials, reduced vegetation, and limited air circulation often make cities warmer than surrounding rural areas\. This phenomenon is known as the urban heat island\. During heatwaves, the combined effects of climate change and urban heat island effects may aggravate thermal conditions\. Therefore, predicting urban heat conditions and estimating urban heat risk have become essential topics in environmental modeling, public health, and urban planning\[CHEVAL2024100603,CUERDOVILCHES2023164412,Hsu2021\]\. Another crucial point is that urban heat does not affect all elements of a city equally\. Some regions of the city have more vegetation and better ventilation, while others are represented by dense built\-up areas and large impervious surfaces\. Population exposure is also not uniform across all locations\. Recent studies have shown that urban heat exposure may be spatially uneven and closely linked to demographic and socioeconomic patterns\. Because of this, urban heat analysis should not be based solely on temperature predictions but should also incorporate exposure and vulnerability factors\[Hsu2021,Huang2023,Pan2024,DAmbrosio2023\]\. Satellite products provide spatially detailed land surface temperature information, while gridded population datasets can be used to estimate population exposure\[YE2025100870,Tatem2017\]\. MODIS land surface temperature products are widely used in thermal remote sensing and have been extensively validated in the literature\[WAN200859,DUAN201916\]\. Population data from the WorldPop project are also often used in spatial exposure analysis\[Tatem2017,Lloyd2017\]\. Conventional heatwave analysis has commonly relied on statistical methods, physical modeling, and conventional machine learning approaches\[Geophysics2023,su17083747\]\. Although these methods are appropriate, they may be limited when the process exhibits strong spatial and temporal dependencies simultaneously\. For that reason, deep learning methods have received increasing attention in recent years because they have shown good results in spatiotemporal prediction problems\[BOUDREAULT2025109965,atmos16010082\]\. Convolutional neural networks are suitable for learning spatial patterns\[10\.1007/978\-981\-19\-1122\-4\-47\], while ConvLSTM architectures are designed to model spatial and temporal structure jointly\. Besides prediction accuracy, computational efficiency is also important in practice\. High\-resolution urban heat modeling involves large spatiotemporal datasets and repeated model training, which may require substantial computational time\[ijgi4042306,hydrology11080127\]\. Therefore, GPU acceleration plays an important practical role, enabling faster training and inference in modern deep learning workflows\[Pandey2022,s24020514\]\. This is especially useful when the forecast results need to be transformed into urban heat risk layers for practical decision support\[LiWang2021,Pan2024\]\. In this chapter, heatwave prediction and urban heat risk assessment are considered from a practical perspective\. The main idea is to combine satellite\-derived land\-surface temperatures with daily meteorological forcing within a GPU\-accelerated deep learning framework\. Two model families are considered, namely a CNN baseline and a ConvLSTM model\. In addition, a simplified urban heat risk layer is constructed by combining predicted thermal intensity with exposure and vulnerability information\. In that way, this chapter connects temperature prediction with a risk\-based interpretation that can be useful in climate adaptation and urban planning studies\[Pan2024,DAmbrosio2023\]\. The main contributions of this chapter are as follows\. First, a practical framework for combining satellite\-derived thermal data and daily meteorological forcing is presented for short\-term urban heat prediction\. Second, the role of GPU acceleration in training and evaluation of deep learning models is discussed\. Third, the influence of dataset design, temporal coverage, and multi\-location meteorological forcing on predictive performance is experimentally analyzed\. Fourth, a simplified methodology for generating urban heat risk maps from predicted thermal conditions, exposure factors, and vulnerability factors is outlined\. The remainder of this chapter is organized as follows\. Section 2 presents related work on heatwave prediction, urban heat analysis, and deep learning approaches\. Section 3 describes the study area, data sources, and preprocessing steps\. Section 4 introduces the proposed GPU\-accelerated framework and the prediction models considered\. Section 5 explains the construction of the urban heat risk layer\. Section 6 presents the experimental results and performance analysis\. Finally, Section 7 gives the concluding remarks and possible directions for future work\.

## 2Related Work

### 2\.1Deep learning for heatwave and urban thermal prediction

Recent studies have shown growing interest in deep learning for heat\-related prediction tasks\. In the broader area of extreme heat forecasting, deep learning models have already been applied to predict extreme heat events from meteorological variables, and the reported results indicate that such data\-driven approaches can achieve good predictive performance\[Shafiq2025\]\. Deep learning has also been used in urban thermal studies, especially for land surface temperature prediction and urban heat island analysis, including frameworks based on multisensor data and machine learning methods for prediction and assessment\[10938603,Wang2025\]\. However, there are still relatively few studies that combine satellite\-derived thermal fields with daily meteorological forcing within a practical framework for next\-day urban thermal prediction\.

### 2\.2Urban heat risk assessment and vulnerability mapping

Another important direction is the transformation of thermal information into risk\-oriented spatial analysis\. Recent studies demonstrate that urban heat assessment should not be limited to temperature mapping but should also incorporate exposure, vulnerability, and population\-related factors\[rs16163032\]\. In this sense, GIS\- and remote\-sensing\-based frameworks are often used to combine thermal indicators with environmental and demographic layers to cause urban heat vulnerability or risk maps\[Pan2024,DAmbrosio2023\]\. However, many of these studies are mostly descriptive or retrospective, while there are still fewer practical strategies that directly connect short\-term thermal prediction with simplified urban heat risk mapping\.

### 2\.3GPU acceleration in environmental deep learning

The third important direction is computational efficiency\. Deep learning workflows in environmental applications usually incorporate large spatiotemporal datasets, repeated model training, and computationally demanding mapping procedures, which makes acceleration important in practice\[Pandey2022\]\. In urban heat applications, GPU\-based computation has already been utilized to accelerate heat\-exposure mapping, and some studies documented reductions in computation time of more than 99% for specific spatial estimations\[LiWang2021\]\. In a broader sense, fast deduction is also essential for near\-real\-time prediction and repeated evaluation\[Kalfarisi2022\]\. However, GPU acceleration is still often treated only as an implementation detail in the background, and less often as an explicit part of an integrated framework for urban thermal prediction and risk\-oriented mapping\. In general, the literature suggests that deep learning\-based heat prediction\[Ge2025,Lyu2022\], urban heat risk assessment\[atmos14020343\], and GPU\-accelerated computation\[Kalfarisi2022\]have mostly been examined separately\.

## 3Study Area, Data Sources, and Preprocessing

### 3\.1Study area: Sarajevo

![Refer to caption](https://arxiv.org/html/2605.16435v1/img/sarajevo_study_area.png)\(a\)Study area: Sarajevo Canton\.
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/sarajevo_terrain.png)\(b\)Terrain\-based spatial context of Sarajevo\.

Figure 1:Spatial context of the study area\. Panel \(a\) shows the geographic extent of Sarajevo used in this chapter, while panel \(b\) illustrates the surrounding topographic structure that contributes to spatial variability in urban thermal behavior\.The case study for urban heat analysis encloses Sarajevo and its surrounding neighborhoods, which were picked because their thermal behavior is influenced by a dense urban structure, valley topography, heterogeneous land cover, and ongoing urban development\. In Fig\.[1](https://arxiv.org/html/2605.16435#S3.F1)\(a\), the studied area combines urban zones, residential districts with distinct construction densities, central transport corridors, and encircling peri\-urban space\. Sarajevo and its surrounding urban areas are suitable for studying heat\-related phenomena, as summer heat may be further intensified by limited air circulation and local atmospheric conditions\. This is also endorsed by the terrain context presented in Fig\.[1](https://arxiv.org/html/2605.16435#S3.F1)\(b\), where the city is encountered within a valley system and influenced by the surrounding elevated terrain\. Such a spatial configuration makes Sarajevo a useful example for studying the relationship between broader meteorological forcing and local urban thermal response\.

### 3\.2Satellite\-derived thermal data: MODIS LST

![Refer to caption](https://arxiv.org/html/2605.16435v1/img/modis_lst_32x32_a.png)\(a\)MODIS LST field over Sarajevo on 2022\-07\-14\.
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/modis_lst_32x32_b.png)\(b\)MODIS LST field over Sarajevo on 2017\-07\-31\.

Figure 2:Examples of MODIS\-derived summer land surface temperature fields over Sarajevo\. The panels show representative daily thermal maps on a32×3232\\times 32grid, highlighting the spatial variability of urban surface temperature patterns across different dates\.The main thermal target considered in this paper is land surface temperature obtained from the MODIS sensor\. MODIS LST products are widely used in urban climate and environmental studies because they provide spatially explicit thermal observations with regular temporal coverage\. In this study, MODIS\-derived thermal maps were used to represent day\-to\-day urban thermal conditions over Sarajevo\. Some representative examples of summer thermal fields extracted from the prepared dataset are shown in Fig\.[2](https://arxiv.org/html/2605.16435#S3.F2), where the spatial temperature variability over the study area is clearly visible across different dates\. Each valid MODIS scene was transformed into a spatial temperature field over Sarajevo and stored in a standardized form suitable for predictive modeling\. After cropping to the study area and removing invalid observations where necessary, all target fields were arranged on a common32×3232\\times 32grid\. In this way, a compact but still spatially informative thermal representation was obtained, suitable for deep learning and able to preserve the main spatial differences important for next\-day urban thermal prediction\.

### 3\.3Meteorological forcing data: Open\-Meteo

![Refer to caption](https://arxiv.org/html/2605.16435v1/img/openmeteo_points.png)Figure 3:Representative spatial distribution of the Sarajevo locations used to obtain daily meteorological forcing variables from the Open\-Meteo service\. Using multiple locations provides a more representative characterization of atmospheric conditions across the study area than relying on a single\-point record\.In addition to thermal imagery, the proposed predictive framework uses daily meteorological forcing variables from the Open\-Meteo service\. These variables were included because urban thermal behavior depends not only on land\-surface conditions but also on surrounding atmospheric conditions\. To obtain a more representative description of daily forcing over the city, meteorological data were collected from several locations in Sarajevo rather than using a single point\. Their spatial distribution is illustrated in Fig\.[3](https://arxiv.org/html/2605.16435#S3.F3)\. The last set of daily variables possesses temperature\-related indicators, humidity, precipitation, radiation, cloudiness, and wind characteristics\. Hence, the predictive framework merges spatial thermal information from satellite measerments with time\-aligned daily meteorological forcing\. The usage of multiple sampling spots was vital, as Sarajevo has spatially distinct urban and topographic markers, and a single meteorological record could not properly capture the variability in causing conditions across the more comprehensive study area\.

### 3\.4Data preprocessing and dataset construction

Input dataPreprocessingModel\-ready outputsMODIS LSTscenesOpen\-Meteo dailyvariablesStudy areadefinitionSpatial croppingto SarajevoInvalid\-pixelfilteringTemporal alignment ofthermal and meteorological dataNormalization / scalingSequence constructionfor temporal modelsCNN inputConvLSTM inputsequencesNext\-day32×3232\\times 32thermal target

Figure 4:Preprocessing pipeline from raw inputs to model\-ready outputs\.A preprocessing stage was necessary in order to transform the heterogeneous raw data into a consistent dataset suitable for deep learning\. Since the considered data sources differ in spatial resolution, temporal frequency, and format, all inputs had to be organized within a common spatial and temporal framework\. The overall workflow is summarized in Fig\.[4](https://arxiv.org/html/2605.16435#S3.F4), which shows the main steps from raw data acquisition to the construction of samples ready for model training\. For the thermal component, invalid MODIS observations were removed as needed, and the remaining scenes were converted into standardized daily raster fields over Sarajevo\. For the meteorological component, daily forcing variables were synchronized with the dates of the available thermal observations\. This temporal alignment step was important because each training sample had to connect predictor variables from a given temporal window with the target thermal field of the next day\. After synchronization, the dataset was divided into input\-output pairs for supervised learning\. In the case of sequence\-based models such as ConvLSTM, the data were further arranged into short temporal sequences so that the network could learn both spatial structure and temporal evolution\. Additional preprocessing steps, such as normalization and scaling, were also applied to improve stability during training and numerical stability\. As shown in Fig\.[4](https://arxiv.org/html/2605.16435#S3.F4), the final result of preprocessing was not only a collection of aligned rasters and tabular variables, but a structured dataset prepared for the considered predictive architectures\. The resulting dataset construction pipeline provided the basis for the experiments presented later in this chapter\.

## 4Proposed GPU\-Accelerated Framework

### 4\.1Problem formulation

The objective of the proposed framework is to predict urban thermal conditions for the next day over the Sarajevo study area and to provide outputs that can later be used for simplified urban heat risk assessment\. LetXtX\_\{t\}present information at daytt, including the recent thermal state of the city represented by spatial temperature fields and the corresponding meteorological forcing variables\. The purpose of the forecast is to comprehend a mapping such as

f:Xt↦Y^t\+1,f:X\_\{t\}\\mapsto\\hat\{Y\}\_\{t\+1\},whereY^t\+1\\hat\{Y\}\_\{t\+1\}is the prediction for dayt\+1t\+1\. In other words, the model output is a raster\-like temperature map representing the expected spatial distribution of thermal intensity across the study area for the following day\. Depending on the considered architecture, the input may be given either as a channel\-stacked spatial representation or as a short temporal sequence of previous observations\. In all cases, the goal is not only to predict a single scalar temperature value but to reconstruct the complete spatial thermal pattern, which makes the problem inherently spatiotemporal\. To achieve the same forecast goal, two model families were assessed: the first is a CNN\-based model used as a baseline predictor of spatial thermal patterns, while the second is a ConvLSTM model made to capture both spatial and temporal dependencies\. The CNN and ConvLSTM models were trained on the same multi\-source dataset and evaluated by metrics such as mean absolute error \(MAE\), root mean square error \(RMSE\), and the coefficient of determination \(R2R^\{2\}\)\.

### 4\.2Prediction models

We analyze the CNN model in the most detail\. We introduced this model in order to use a simple deep learning predictor for spatial thermal patterns\. The main reason for using it is its compact, fully convolutional architecture, which can process raster\-like inputs while remaining simple enough for stable training and direct implementation\. We assume that the input tensor is of sizeB×T×C×H×WB\\times T\\times C\\times H\\times W, whereBB,TT, andCCare the batch size, sequence length, and the number of channels for one day, respectively\. Also, inside the CNN model, the temporal and channel dimensions were merged before convolutional processing, so that the input is mapped into the shape of dimensionB×\(T⋅C\)×H×WB\\times\(T\\cdot C\)\\times H\\times W\. Except for the mentioned, the network includes three3×33\\times 3convolutional layers with padding11, followed by ReLU activations, and a final1×11\\times 1convolution, producing the predicted single\-channel thermal map\. For the hidden width equal to6464, the channel progression is

\(T⋅C\)→64→64→32→1\.\(T\\cdot C\)\\rightarrow 64\\rightarrow 64\\rightarrow 32\\rightarrow 1\.Therefore, the CNN baseline follows the structure:

- •Conv2D\(T⋅C,64,3×3,padding=1\)\(T\\\!\\cdot\\\!C,64,3\\times 3,\\text\{padding\}=1\)\+ ReLU,
- •Conv2D\(64,64,3×3,padding=1\)\(64,64,3\\times 3,\\text\{padding\}=1\)\+ ReLU,
- •Conv2D\(64,32,3×3,padding=1\)\(64,32,3\\times 3,\\text\{padding\}=1\)\+ ReLU,
- •Conv2D\(32,1,1×1\)\(32,1,1\\times 1\)\.

Although the architecture lacks pooling layers, the spatial resolution is kept throughout the network, and the output is fixed to a32×3232\\times 32thermal matrix\. However, temporal development is not explicitly modeled, except through the implicit information furnished by channel stacking\. The main predictive model used in this chapter is a ConvLSTM architecture\. Unlike the CNN baseline, this model keeps the temporal structure explicitly and processes the input in its original sequential form, that is, as a tensor of sizeB×T×C×H×WB\\times T\\times C\\times H\\times W\. At each time step, the current input and the previous hidden state are concatenated and passed through a3×33\\times 3convolution with padding11\. In that way, the input, forget, output, and candidate gates of the ConvLSTM cell are generated\. In the final experiments, the hidden dimension was set to3232\. After the last time step, the final hidden state is passed to a convolutional prediction head\. This head consists of one convolutional layer of size3×33\\times 3with3232output channels, followed by a ReLU activation function, and a final convolutional layer of size1×11\\times 1that produces the predicted single\-channel thermal map\. Therefore, the prediction head has the following form:

- •Conv2D\(32,32,3×3,padding=1\)\(32,32,3\\times 3,\\text\{padding\}=1\)\+ ReLU,
- •Conv2D\(32,1,1×1\)\(32,1,1\\times 1\)\.

Such a design allows the ConvLSTM model to capture both the spatial arrangement of urban heat patterns and their short\-term temporal evolution\. This is important in the case of persistent heatwave conditions\. The structure of the CNN baseline is shown in Fig\.[5](https://arxiv.org/html/2605.16435#S4.F5)\. The model first merges the temporal and channel dimensions, then applies a fully convolutional mapping to obtain the predicted thermal field for the next day\. The ConvLSTM architecture used in the final experiments is shown in Fig\.[6](https://arxiv.org/html/2605.16435#S4.F6)\. In contrast to the CNN baseline, it preserves the temporal structure and processes the input sequence recurrently before generating the final thermal map by a convolutional prediction head\.

InputB×T×C×32×32B\\times T\\times C\\times 32\\times 32ReshapeB×\(T⋅C\)×32×32B\\times\(T\\\!\\cdot\\\!C\)\\times 32\\times 32Conv2D:\(T⋅C\)→64\(T\\\!\\cdot\\\!C\)\\rightarrow 643×33\\times 3\+ ReLUConv2D:64→6464\\rightarrow 643×33\\times 3\+ ReLUConv2D:64→3264\\rightarrow 323×33\\times 3\+ ReLUConv2D:32→132\\rightarrow 11×11\\times 1OutputB×1×32×32B\\times 1\\times 32\\times 32Figure 5:CNN baseline architecture used for next\-day urban thermal prediction\.Input sequenceB×T×C×32×32B\\times T\\times C\\times 32\\times 32Temporal processingx1,…,xTx\_\{1\},\\dots,x\_\{T\}ConvLSTM cellinput channels=C=C, hidden dim=32=323×33\\times 3gatesFinal hidden stateB×32×32×32B\\times 32\\times 32\\times 32Conv2D:32→3232\\rightarrow 323×33\\times 3\+ ReLUConv2D:32→132\\rightarrow 11×11\\times 1OutputB×1×32×32B\\times 1\\times 32\\times 32Figure 6:ConvLSTM architecture used in the final experiments\.
### 4\.3GPU acceleration strategy

Because the proposed framework involves repeated training and evaluation of multiple deep learning architectures on multi\-source spatiotemporal data, computational efficiency becomes an important practical issue\. For that reason, all experiments in this chapter are designed around GPU\-accelerated execution\. The experiments were carried out on a workstation equipped with an AMD64 processor, 16 GB of RAM, and Windows 11\. Training was performed on CUDA\-enabled hardware, and GPU acceleration was used in order to reduce computation time and allow repeated model evaluation under different dataset configurations\. The implementation relies on CUDA\-enabled tensor computation, which allows convolutions and recurrent updates to be executed in parallel\. This substantially reduces the training time compared with a purely CPU\-based workflow\. In addition, mixed\-precision training is employed where appropriate in order to further improve computational efficiency and reduce memory usage\. This is particularly useful when working with image\-like spatial data and sequence\-based models, since both can lead to significant memory demands during training\. From a methodological perspective, GPU acceleration is not treated only as a technical implementation detail, but as an integral part of the proposed practical framework\. Without accelerated training, repeated experimentation with CNN and ConvLSTM models would be significantly more difficult, especially when multiple input configurations and loss formulations must be tested\. The use of GPU resources therefore directly supports the experimental design of the chapter and helps make the proposed urban thermal prediction workflow computationally viable in practice\.

## 5Urban Heat Risk Assessment Layer

Although the primary focus of this chapter is next\-day urban thermal prediction, the obtained predictive maps can also be used as the basis for a simplified urban heat risk assessment framework\. In climate\-risk literature, risk is commonly interpreted through the interaction of hazard, exposure, and vulnerability, and this perspective has also been adopted in urban heat studies\[doi:10\.1177/23998083241280746,doi:10\.1371/journal\.pone\.0127277\]\. The main idea here is therefore to extend temperature prediction toward a more decision\-oriented spatial interpretation by combining the predicted thermal field with additional exposure and vulnerability information\. In this chapter, this risk\-oriented extension is intentionally kept simple and transparent\. Its purpose is not to provide a fully operational urban risk system, but rather to illustrate how GPU\-accelerated thermal prediction can support practical urban heat analysis and spatial screening of potentially critical urban zones\[ZHANG2019852\]\.

### 5\.1Hazard layer

The hazard layer represents the spatial intensity of predicted thermal stress over the study area\. In the proposed framework, it is derived directly from the predicted next\-day thermal field produced by the deep learning models\. Such an interpretation is consistent with earlier urban heat risk studies in which thermal indicators, including land surface temperature or related heat\-stress measures, are used as the hazard component of the risk framework\[ZHANG2019852,doi:10\.1177/23998083241280746\]\. Locations with higher predicted temperature values are interpreted as having higher thermal hazard, while locations with lower values correspond to lower hazard\. For integration with the remaining components of the framework, the predicted field can be normalized to a common interval such as\[0,1\]\[0,1\], which is a standard step when combining heterogeneous indicators with different units and scales\[10\.3389/fpubh\.2022\.989963,ZHANG2019852\]\. In this way, the hazard layer serves as the predictive component of the urban heat risk assessment, indicating where thermally critical conditions are expected to occur on the following day\.

### 5\.2Exposure layer

The exposure layer is introduced to represent the spatial distribution of people or urban elements that may be affected by elevated thermal conditions\. Its role is important because high thermal intensity alone does not necessarily indicate a high practical risk, unless the affected area also includes a considerable human presence or urban activity\. In urban heat risk mapping, exposure is often represented by population density, pedestrian presence, residential concentration, or other indicators of human activity\[ZHANG2019852,doi:10\.1177/23998083241280746\]\. In a simplified setting, the exposure layer can therefore be constructed from gridded demographic data or from proxy indicators such as residential density or built\-up intensity\. After alignment with the spatial grid of the hazard layer, the obtained surface is normalized so that higher values indicate stronger exposure\[10\.3389/fpubh\.2022\.989963\]\. In this way, it becomes possible to distinguish between areas with low exposure to heat and those where heat may affect a larger number of people\.

### 5\.3Vulnerability layer

The vulnerability layer describes the differential sensitivity of different parts of the city to thermal stress\. In urban heat research, vulnerability is commonly associated with environmental, demographic, social, and urban\-form characteristics that increase sensitivity or reduce coping capacity under hot conditions\[ijerph15040640,10\.3389/fpubh\.2022\.989963,su15031820\]\. In real life, these elements may include limited vegetation, dense built\-up structure, impervious surfaces, low ventilation potential, disadvantaged socioeconomic conditions, or the presence of particularly sensitive population groups\. Several recent studies have shown that vulnerability\-oriented heat assessments benefit from integrating both social indicators and spatial characteristics of the urban fabric\[ijerph15040640,su15031820\]\. In the present chapter, vulnerability is represented in a simplified manner through spatial indicators that align with the study grid\. As in the case of hazard and exposure, the resulting layer is normalized before integration\[10\.3389/fpubh\.2022\.989963\]\. Higher values indicate locations where predicted heat is expected to have stronger adverse implications due to less favorable local conditions\.

### 5\.4Risk index generation

After the hazard, exposure, and vulnerability layers have been organized, they can be united into one simplified urban heat risk index\. This follows the general idea of climate\- and urban heat\-risk assessment, where the most critical areas are those in which a strong thermal hazard co\-occurs with high exposure and increased vulnerability\[10\.3389/fpubh\.2022\.989963,ZHANG2019852\]\. One simple and transparent formulation is

R​\(i,j\)=H​\(i,j\)⋅E​\(i,j\)⋅V​\(i,j\),R\(i,j\)=H\(i,j\)\\cdot E\(i,j\)\\cdot V\(i,j\),whereH​\(i,j\)H\(i,j\),E​\(i,j\)E\(i,j\), andV​\(i,j\)V\(i,j\)denote the normalized hazard, exposure, and vulnerability values at spatial position\(i,j\)\(i,j\), respectively\. Multiplicative and similar composite formulations based on normalized indicators are often used in heat\-vulnerability and heat\-risk studies because they emphasize the joint effects of multiple factors, rather than relying solely on temperature\[10\.3389/fpubh\.2022\.989963,ZHANG2019852\]\. The final risk surface can then be presented as a raster map or divided into ordinal categories such as low, moderate, high, and very high risk\. Although this formulation is simplified, it is sufficient to show how next\-day thermal forecasting can be extended towards interpretable urban heat risk assessment products\[10\.3389/fpubh\.2022\.989963,doi:10\.1177/23998083241280746\]\.

## 6Experimental Results

The experimental part of this article was designed as a practical case study for next\-day urban thermal prediction over Sarajevo\. The goal was to predict the next daily land surface temperature \(LST\) map from a short temporal sequence of previous observations and meteorological forcing variables\. All experiments were performed on CUDA\-enabled hardware, and GPU acceleration was used during training in order to reduce training time and allow repeated model evaluation under different dataset configurations\. The target output in all experiments was a normalized MODIS LST map of spatial size32×3232\\times 32\. Hence, each target map contains10241024pixel values\. The input tensor for one sample had the general form

X∈ℝT×C×H×W,X\\in\\mathbb\{R\}^\{T\\times C\\times H\\times W\},whereTTdenotes the sequence length,CCdenotes the number of channels per day, andH=W=32H=W=32\. In the final experiments, the sequence length was fixed toT=3T=3\. Two main dataset configurations were considered\. The first configuration combined daily MODIS LST maps with daily meteorological forcing extracted from a single Open\-Meteo location over Sarajevo\. The second configuration extended both the temporal coverage and the number of meteorological inputs by using four representative Sarajevo locations together with a longer time interval\. The final configuration of the best\-performing ConvLSTM experiment is summarized in Table[1](https://arxiv.org/html/2605.16435#S6.T1)\. It includes the main characteristics of the input representation, the temporal setup, the training configuration, and the evaluation protocol used in the final stage of the study\. All experiments were carried out on a workstation equipped with an AMD64 processor, 16 GB of RAM, and Windows 11\.

Table 1:Final experimental setup used for the best\-performing ConvLSTM model\.### 6\.1Dataset Construction

The first dataset configuration was obtained by combining MODIS MOD11A1 daily daytime LST data with Open\-Meteo daily meteorological variables for Sarajevo\. In the case of one\-location forcing, the following meteorological variables were used:

- •temperature\_2m\_mean,
- •temperature\_2m\_max,
- •temperature\_2m\_min,
- •dew\_point\_2m\_mean,
- •precipitation\_sum,
- •wind\_speed\_10m\_mean,
- •shortwave\_radiation\_sum\.

These variables were combined with one MODIS LST channel, so that the total number of channels per day was equal to88\. For the temporal sequence length equal to33, one input sample had the form

3×8×32×32\.3\\times 8\\times 32\\times 32\.After preprocessing and removal of invalid cases, this baseline configuration contained667667valid samples\. In the second configuration, the dataset was extended in several ways\. First, the MODIS period was expanded to 2010–2025, while the initial configuration covered the period 2019–2024\. Second, meteorological forcing was not collected from only one point, but from four Sarajevo locations: Sarajevo Center, Ilidža, Dobrinja, and Vogošča\. Third, the number of meteorological variables was increased to include:

- •temperature\_2m\_mean,
- •temperature\_2m\_max,
- •temperature\_2m\_min,
- •dew\_point\_2m\_mean,
- •precipitation\_sum,
- •shortwave\_radiation\_sum,
- •relative\_humidity\_2m\_mean,
- •cloud\_cover\_mean,
- •wind\_direction\_10m\_dominant,
- •wind\_speed\_10m\_mean\.

Since these variables were taken for four locations, the meteorological forcing contributed4040channels per day\. Together with the MODIS LST channel, the total number of channels per day was4141\. Therefore, one input sample had the form

3×41×32×32\.3\\times 41\\times 32\\times 32\.After alignment in time and removal of invalid cases, the final extended dataset contained18711871valid samples\.

### 6\.2Models and Training Procedure

Two deep learning architectures were considered during the experiments: a CNN baseline and a ConvLSTM model\. The CNN was used as a simpler reference model, while the ConvLSTM was expected to better capture spatiotemporal dependencies\. In the final stage of the experiments, the ConvLSTM model clearly showed better behaviour and was therefore selected as the main model for detailed analysis\. The training and validation sets were obtained using an80%/20%80\\%/20\\%split\. Thus, for the extended dataset configuration, the model was trained on15381538samples and validated on333333samples\. In all experiments, training was performed on GPU\. Mixed precision was also used during training in order to improve computational efficiency\. At first, a standard mean squared error loss was used\. However, the predicted maps were often too smooth, even when the quantitative metrics were acceptable\. For this reason, a hybrid loss was introduced:

ℒ=α​ℒ1\+\(1−α\)​ℒ2,\\mathcal\{L\}=\\alpha\\mathcal\{L\}\_\{1\}\+\(1\-\\alpha\)\\mathcal\{L\}\_\{2\},whereℒ1\\mathcal\{L\}\_\{1\}denotes the mean absolute error loss,ℒ2\\mathcal\{L\}\_\{2\}denotes the mean squared error loss, andα=0\.7\\alpha=0\.7in our experiments\. This modification led to both better numerical results and visually more realistic thermal maps\.

### 6\.3Evaluation Metrics

The models were evaluated using three standard regression metrics:

- •mean absolute error \(MAE\),
- •root mean squared error \(RMSE\),
- •coefficient of determination \(R2R^\{2\}\)\.

These metrics were computed over the predicted and reference LST maps\.

### 6\.4Quantitative Results

The most important quantitative results are summarized in Table[2](https://arxiv.org/html/2605.16435#S6.T2)\. The first strong result was obtained on the MODIS \+ Open\-Meteo single\-location dataset 1 with667667samples\. Using the ConvLSTM model with the hybrid loss, the method achieved the following values:MAE=0\.2543\\textbf\{MAE\}=0\.2543,RMSE=0\.3349\\textbf\{RMSE\}=0\.3349, andR2=0\.8606\\textbf\{R\}^\{2\}=0\.8606\. After extending the temporal range and replacing the single\-location forcing with a multi\-location meteorological representation, the results improved further\. On the final dataset 2 with18711871samples, the ConvLSTM model achieved the following values:MAE=0\.2293\\textbf\{MAE\}=0\.2293,RMSE=0\.3089\\textbf\{RMSE\}=0\.3089, andR2=0\.8877\\textbf\{R\}^\{2\}=0\.8877\. These results show that both a stronger dataset and a richer meteorological representation improve next\-day urban thermal prediction\.

Table 2:Main quantitative results obtained with the ConvLSTM model\.\\begin\{array\}\[\]\{ccc\}\\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.2\}\]\{img/fig1\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(a\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(a\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{First reference map\.\}\\end\{subfigure\}&\\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.2\}\]\{img/fig3\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(b\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(b\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{Second reference map\.\}\\end\{subfigure\}&\\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.2\}\]\{img/fig5\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(c\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(c\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{Third reference map\.\}\\end\{subfigure\}\\\\ \\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.2\}\]\{img/fig2\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(d\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(d\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{First predicted map\.\}\\end\{subfigure\}&\\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.2\}\]\{img/fig4\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(e\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(e\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{Second predicted map\.\}\\end\{subfigure\}&\\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.2\}\]\{img/fig6\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(f\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(f\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{Third predicted map\.\}\\end\{subfigure\}\\end\{array\}

![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig1.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig3.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig5.png)\(c\)
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig2.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig4.png)\(e\)
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig6.png)\(f\)

Figure 7:Qualitative comparison between reference and predicted thermal maps for three representative examples\.\\begin\{array\}\[\]\{cc\}\\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.22\}\]\{img/fig7\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(a\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(a\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{\}\\end\{subfigure\}&\\begin\{subfigure\}\{433\.62pt\}\\includegraphics\[scale=\{\.2\}\]\{img/fig8\.png\}\\@@toccaption\{\{\\lx@tag\[ \]\{\{\(b\)\}\}\{\}\}\}\\@@caption\{\{\\lx@tag\[ \]\{\{\\small\(b\)\}\}\{\\small\}\}\}\\lx@subcaption@addinlist\{\}\\end\{subfigure\}\\end\{array\}

![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig7.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.16435v1/img/fig8.png)\(b\)

Figure 8:\(a\) Distribution of per\-sample MAE values\. \(b\) Per\-sample MAE across all evaluated samples\.
### 6\.5Qualitative Results

Besides the global metrics, qualitative inspection of predicted maps was also performed\. In general, the ConvLSTM model was able to recover the main large\-scale thermal structure of the target maps\. The warm regions and the main spatial gradients were usually well reconstructed, especially in representative medium\-complexity cases\. However, the predictions were still smoother than the reference MODIS maps\. This means that the model was more successful in capturing coarse spatial patterns than very fine local anomalies\. Such behavior is expected, since the target maps contain sharp local irregularities, while the model tries to learn stable spatiotemporal structure from limited and partially noisy observations\. Figure[7](https://arxiv.org/html/2605.16435#S6.F7)presents three representative qualitative examples\. The first row contains the reference thermal maps, while the second row shows the corresponding predictions produced by the ConvLSTM model\. In the first two examples, the model follows the main spatial structure of the reference maps reasonably well and produces relatively small prediction errors\. The corresponding errors for the predicted maps are as follows: first predicted map \(MAE = 0\.1548, RMSE = 0\.2387\) and second predicted map \(MAE = 0\.1319, RMSE = 0\.1868\)\. In the third example, however, the model fails to reproduce a more difficult and strongly localized thermal pattern, which results in substantially larger error values \(MAE = 1\.1496, RMSE = 1\.1693\)\. Overall, the qualitative analysis confirms that the model is generally able to capture the dominant thermal structure of the target maps, although the predicted maps remain smoother than the corresponding reference MODIS maps\. To better understand the global behavior of the model over the full evaluation set, the per\-sample MAE values were analyzed in two complementary ways\. Figure[8](https://arxiv.org/html/2605.16435#S6.F8)\(a\) shows the distribution of per\-sample MAE values\. It can be seen that most samples are concentrated in the low\-to\-moderate error range, while only a relatively small number of cases produce large errors\. This indicates that the overall performance of the model is generally stable, although a limited number of difficult outliers still remain\. Figure[8](https://arxiv.org/html/2605.16435#S6.F8)\(b\) presents the MAE value for each evaluated sample\. The plot shows that, for most samples, the error remains relatively low, while occasional peaks correspond to more challenging thermal patterns that are harder for the model to reproduce accurately\.

### 6\.6Discussion

The experiments clearly indicate that dataset quality plays a major role in the final predictive performance\. The initial experiments, which relied on a smaller and less informative dataset, produced smoother predictions and less stable quantitative results\. After increasing the number of samples, extending the temporal coverage, and enriching the meteorological forcing with data from multiple Sarajevo locations, the overall performance improved substantially\. The best results were achieved by the final ConvLSTM model trained with the hybrid loss function\. ItsR2R^\{2\}score of0\.88770\.8877indicates a strong agreement between the predicted and reference thermal maps\. At the same time, the qualitative analysis shows that some limitations still remain\. In particular, the model tends to smooth out very fine local thermal details, and a limited number of challenging cases still produce relatively large errors\. Nevertheless, the obtained results are strong enough to confirm the usefulness of the proposed practical framework\. They show that a GPU\-accelerated ConvLSTM model, combined with satellite\-derived thermal observations and enriched daily meteorological forcing, can provide meaningful next\-day urban thermal predictions for a real urban environment\.

## 7Conclusion

This chapter presented a practical GPU\-accelerated deep learning framework for next\-day urban thermal prediction and simplified urban heat risk assessment\. The proposed approach combined MODIS land surface temperature maps with daily meteorological forcing variables to model short\-term thermal dynamics over Sarajevo\. Special attention was given to multi\-source dataset construction, spatiotemporal deep learning models, and the use of GPU acceleration to enable efficient repeated training and evaluation\. The results showed that dataset design strongly affects predictive performance\. While early experiments based on smaller datasets produced smoother and less stable predictions, substantial improvements were achieved after extending the temporal range, increasing the number of valid samples, and introducing multi\-location meteorological forcing\. The best performance was obtained with the ConvLSTM model trained using a hybrid loss function, reaching an MAE of 0\.2293, an RMSE of 0\.3089, and anR2R^\{2\}score of 0\.8877\. These results indicate strong agreement between predicted and reference thermal maps, although some fine local details remain difficult to reconstruct\. Overall, the findings confirm that GPU\-accelerated deep learning provides a practical basis for urban thermal prediction in real environments\. The proposed framework is flexible enough to support further methodological improvements and application\-oriented extensions related to urban climate adaptation, early warning, and heat risk assessment\.

## References

Similar Articles