Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery
Summary
This paper applies topological data analysis to flood detection by extracting topological features from satellite imagery and incorporating them into neural networks, demonstrating improved robustness and interpretability over conventional methods.
View Cached Full Text
Cached at: 06/26/26, 05:16 AM
# Topology-Informed Neural Networks for Flood Detection in Optical and Synthetic Aperture Radar Imagery
Source: [https://arxiv.org/html/2606.26204](https://arxiv.org/html/2606.26204)
\\authorinfo
Further author information: \(Send correspondence to Max Zhao\) Max Zhao: E\-mail: max\.zhao@stanford\.edu
Max ZhaoStanford University, Stanford, CA, USARaghu G\. RajUS Naval Research Laboratory, Washington, DC, USATianyu ChenUS Naval Research Laboratory, Washington, DC, USA
###### Abstract
Floods frequently impact regions around the world\. Rapid and accurate flood detection is crucial for emergency response and timely mitigation of human and economic loss\. The expanding availability of satellite data and advances in artificial intelligence have enhanced monitoring of environmental hazards, but many flood events remain challenging to detect because cloud cover obscures optical satellite imagery\. Rambour et al\. introduced the SEN12\-FLOOD dataset and extracted per\-image features using a ResNet\-50 convolutional neural network backbone, then fed these features into a gated recurrent unit network to show that temporal information can substantially improve accuracy compared to single\-image baselines\. More recently, Chamatidis et al\. showed that a vision transformer can achieve strong performance with popular convolutional architectures\. However, these models typically function as opaque black boxes, making it difficult to interpret their decision boundaries, learned features, and internal reasoning, especially in safety\-critical domains like remote sensing\. In contrast, topological data analysis \(TDA\) provides a mathematically grounded framework for capturing global structural features of data\. TDA has emerged as a powerful tool for analyzing complex imagery, especially imagery with geometrically interpretable structures, of which floods are a prime candidate\. In this work, we systematically evaluate topological descriptors for flood detection using the open\-source SEN12\-FLOOD dataset\. By extracting topological features from each image and incorporating them into neural networks, we demonstrate that topological descriptors carry meaningful flood signals independently and complement existing networks to yield more robust and interpretable flood detection systems\.
###### keywords:
Flood detection, topological data analysis, persistent homology, synthetic aperture radar, optical satellite imagery, SEN12\-FLOOD, gated recurrent units, remote sensing
## 1Introduction
Recently, the expanding availability of satellite data and rapid advancements in artificial intelligence have significantly enhanced our capability for monitoring environmental hazards\[[27](https://arxiv.org/html/2606.26204#bib.bib1)\]\. Nevertheless, the detection of many flood events remains challenging due to persistent cloud cover obscuring optical satellite imagery\[[2](https://arxiv.org/html/2606.26204#bib.bib2)\]\. Synthetic Aperture Radar \(SAR\) systems, such as the Sentinel\-1 satellite, can penetrate clouds, but inherently contain speckle\[[21](https://arxiv.org/html/2606.26204#bib.bib3)\]\. As such, models that leverage both optical and SAR data can overcome individual sensor limitations\.
A major issue in flood detection with SAR data is that many studies are tested on author\-made datasets, which lack baselines and limit reproducibility\[[2](https://arxiv.org/html/2606.26204#bib.bib2)\]\. To address this issue, Rambour et al\.\[[24](https://arxiv.org/html/2606.26204#bib.bib4)\]developed the publicly available SEN12\-FLOOD dataset, offering coregistered time series data from both Sentinel\-1 \(S1\) SAR and Sentinel\-2 \(S2\) multispectral imagery covering locations near 2018 and 2019 floods in Africa, Iran, and Australia\. Both rural and urban areas are captured, as rural communities often lack resilient infrastructure and relief resources, while urban centers have concentrated populations and high\-value assets\. Each image in the sequence has a binary label specifying if a flood event has happened in the location of the sequence\. This dataset leverages the two mentioned modalities of remote sensing data, and serves to assess the important role of raising flood alarms prompting human review and usage of more expensive flood segmentation algorithms, a crucial first step in the disaster relief process\. As a baseline, Rambour et al\. extracted per\-image features using a ResNet\-50 Convolutional Neural Network \(CNN\)\[[15](https://arxiv.org/html/2606.26204#bib.bib5)\]backbone, then fed features into a Gated Recurrent Unit \(GRU\)\[[8](https://arxiv.org/html/2606.26204#bib.bib6)\]for flood classification\. They demonstrated that temporal information substantially improves accuracy compared to single\-image binary classifier baselines, with improvements up to 10%\.
Various experiments comparing different computer vision architectures have been conducted on this dataset\. Mittal et al\.\[[20](https://arxiv.org/html/2606.26204#bib.bib7)\]surveyed the performance of various CNN architectures for binary classification, and Chamatidis et al\.\[[7](https://arxiv.org/html/2606.26204#bib.bib8)\]showed that a Vision Transformers \(ViT\) can also achieve good performance\. Nevertheless, Rambour’s original findings highlight that time\-series change detection is particularly well suited for flood detection tasks\. For example, a river that appears narrow in one image but widens significantly in subsequent images may indicate flooding—information that cannot be reliably captured by static binary classifiers, especially given the existence of non\-water frames in the dataset\. Consequently, methods relying solely on binary classification of single images are unsuitable for practical deployment\.
Recently, Topological Data Analysis \(TDA\) has shown great promise as a tool in a variety of fields\. The fact that TDA uses algebraic topology, means it can capture properties of complex data that traditional machine learning methods cannot capture\[[11](https://arxiv.org/html/2606.26204#bib.bib9)\]\. Features capture global structural characteristics of a dataset that are unchanged under deformation, are provably stable under perturbation\. As such, TDA shows promise for noisy and incomplete datasets\. Another advantage of such a mathematically grounded framework is that it is contains human\-understandable features, as opposed to most deep neural networks, whose features usually function as opaque black boxes\. Such understanding is beneficial for safety\-critical fields \(e\.g\. disasters\) where lives are at risk\. Previous studies have demonstrated the success of combining convolutional and topological features for image classification across various domains, including medical images\[[23](https://arxiv.org/html/2606.26204#bib.bib10)\], handwriting\[[19](https://arxiv.org/html/2606.26204#bib.bib11)\], and land data captured by optical remote sensing satellites\[[26](https://arxiv.org/html/2606.26204#bib.bib12)\]\. Such success implies that convolutional \(local textures\) and topological \(global structures\) of complex datasets complement each other\.
In this work, we first improve on temporal\-classification GRU baselines for SEN12\-FLOOD by introducing transfer learning\. Then, we propose a lightweight variant of a Gaussian topological embedding that achieves stable training convergence and improved performance when topological features are fed into a GRU, compared to the results in \[4\]\. Finally, we demonstrate improvement when convolutional and topological features are both used, achieving a 98\.9% accuracy compared to a baseline of 95\.7%\.
## 2Background
### 2\.1Dataset
The SEN12\-FLOOD dataset comprises a total of 335 sequences, each corresponding to a unique geographic location in Africa, Iran, or Australia\. Each sequence contains multiple frames from S1 SAR and S2 multispectral optical sensors\. On average, there are days or weeks between the capture of each frame and the full sequences can span months\. The S1 frames were provided as two polarization channels Vertical transmit\-Vertical receive \(VV\) and Vertical transmit\-Horizontal receive \(VH\) in linear backscatter intensity, while the S2 frames included twelve spectral bands \(B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12 with pixel sizes 60m, 10m, 10m, 10m, 20m, 20m, 20m, 10m, 20m, 60m, 20m, 20m respectively\)\. The S1 SAR dataset includes image frames acquired by both Sentinel\-1A \(S1A\) and Sentinel\-1B \(S1B\) satellites\. As the two satellites follow distinct orbital paths, they observe the same regions from slightly different incidence angles, which may result in geometric variations within the imagery\[[21](https://arxiv.org/html/2606.26204#bib.bib3)\]\. The S2 data was inherited from the MediaEval 2019 dataset\[[3](https://arxiv.org/html/2606.26204#bib.bib13)\]\. Each acquisition is annotated with a binary flood status label where0denotes non\-flooded and11denotes flooded conditions\. The dataset follows the hypothesis that flooded features persist in all post\-flood observations of an area\.
### 2\.2Topological Data Analysis
We first describe the necessary topological constructions for a two\-dimensional grayscale image and then introduce persistent homology, the main tool of TDA, which tracks the birth and death of features such as connected components and holes\. These operations were implemented using the CubicalRipser Python libary\[[17](https://arxiv.org/html/2606.26204#bib.bib14)\]\.
#### Grid, hypercubes, and cubical complexes\.
Intuition\.While TDA, especially on point clouds, conventionally uses simplices \(triangles, tetrahedra, and higher dimensional analogs\), it is more convenient for us to consider squares and cubes as our building blocks\. Consider aDD\-dimensional integer grid\. A cube is built by choosing, in each coordinate, either a unit interval \(extending\) or a single grid point \(fixed\)\. Faces result from turning an extended coordinate into a fixed one\. Definition\.FixD∈ℕD\\in\\mathbb\{N\}and grid sizesN1,…,ND∈ℕN\_\{1\},\\dots,N\_\{D\}\\in\\mathbb\{N\}\. A \(unit\) hypercube inℝD\\mathbb\{R\}^\{D\}is specified by a pair\(v,σ\)\(v,\\sigma\)withv∈ℤDv\\in\\mathbb\{Z\}^\{D\}andσ∈\{e,f\}D\\sigma\\in\\\{e,f\\\}^\{D\}, represented by the product
Q\(v,σ\)=∏i=1D\{\[vi,vi\+1\],σi=e,\{vi\},σi=f\.Q\(v,\\sigma\)\\;=\\;\\prod\_\{i=1\}^\{D\}\\begin\{cases\}\[v\_\{i\},\\,v\_\{i\}\+1\],&\\sigma\_\{i\}=e,\\\\\[2\.0pt\] \\\{v\_\{i\}\\\},&\\sigma\_\{i\}=f\.\\end\{cases\}\(1\)Its dimension isdimQ=\#\{i:σi=e\}\\dim Q=\\\#\\\{i:\\sigma\_\{i\}=e\\\}\. For any indexkkwithσk=e\\sigma\_\{k\}=e,QQhas two codimension\-1 faces in thekkth direction:
lower face:Q\(v,σwithσk←f\),upper face:Q\(v\+ek,σwithσk←f\)\\text\{lower face: \}Q\\bigl\(v,\\,\\sigma\\text\{ with \}\\sigma\_\{k\}\\\!\\leftarrow\\\!f\\bigr\),\\qquad\\text\{upper face: \}Q\\bigl\(v\+e\_\{k\},\\,\\sigma\\text\{ with \}\\sigma\_\{k\}\\\!\\leftarrow\\\!f\\bigr\)\(2\)A cubical complex,𝒞\\mathcal\{C\}, is a finite set of such hypercubes closed under faces, meaning that whenever a cube,QQ, belongs to𝒞\\mathcal\{C\}, all of its faces \(as in \([2](https://arxiv.org/html/2606.26204#S2.E2)\)\) must also belong to𝒞\\mathcal\{C\}\.
#### 2D grayscale maps as cubical complexes \(V\-construction\)\.
Intuition\.Each pixel is treated as a vertex, with edges and squares included as needed\. Collectively, these vertices, edges, and squares form the cubical complex\. Definition\.ForD=2D=2, let the vertex index set be\{0,…,N1−1\}×\{0,…,N2−1\}\\\{0,\\dots,N\_\{1\}\-1\\\}\\times\\\{0,\\dots,N\_\{2\}\-1\\\}\. A grayscale image is a function on vertices
v:\{0\-cubes\}→ℝ≥0,v\(i,j\)=intensity at vertex\(i,j\)v:\\ \\\{\\text\{0\-cubes\}\\\}\\to\\mathbb\{R\}\_\{\\geq 0\},\\qquad v\(i,j\)=\\text\{intensity at vertex \}\(i,j\)\(3\)The cubical complex consists of all unit edges and unit squares whose vertices lie in this index set; that is, each unit
kk\-cube is included precisely when all of its vertices are contained in the index set\.
Remark\.The alternative T\-construction assigns intensities to top cells and extends them to faces via the
min\\minoperation\. The V\- and T\-constructions are related by duality\. In our case, the particular choice of construction does not substantially affect the results\.
#### Lower\-star extension and filtration \(vertex\-based\)\.
Intuition\.Turn on all cells whose vertices are at most thresholdtt; asttincreases, the complex grows\. Definition\.Extendvvfrom vertices to all cubes by
f\(c\)=max\{v\(p\):pis a vertex ofc\}f\(c\)\\;=\\;\\max\\bigl\\\{\\,v\(p\)\\;:\\;p\\text\{ is a vertex of \}c\\,\\bigr\\\}\(4\)Thenffis monotone under inclusion: ifc⊆c′c\\subseteq c^\{\\prime\}thenf\(c\)≤f\(c′\)f\(c\)\\leq f\(c^\{\\prime\}\)\. Fort∈ℝt\\in\\mathbb\{R\}, define the sublevel\-set filtration
Ct=\{c∈𝒞:f\(c\)≤t\}C\_\{t\}\\;=\\;\\bigl\\\{\\,c\\in\\mathcal\{C\}\\ :\\ f\(c\)\\leq t\\,\\bigr\\\}\(5\)This yields a nested sequencet≤s⇒Ct⊆Cst\\leq s\\Rightarrow C\_\{t\}\\subseteq C\_\{s\}that grows from a low\-intensity structure to the full complex\.
#### Persistent homology\.
Let𝕜\\Bbbkbe a field \(e\.g\.ℤ2\\mathbb\{Z\}\_\{2\}\)\. Given the sublevel\-set filtration\{Ct\}t∈ℝ\\\{C\_\{t\}\\\}\_\{t\\in\\mathbb\{R\}\}from the previous section, the homology groupsHk\(Ct;𝕜\)H\_\{k\}\(C\_\{t\};\\Bbbk\)summarizekk\-dimensional topological features present at thresholdtt: connected components \(k=0k\{=\}0\), loops \(k=1k\{=\}1\), and higher\-dimensional voids fork≥2k\\geq 2\. In our two\-dimensional case, we only trackk=0k\{=\}0andk=1k\{=\}1\. Asttincreases, a feature is born when it first appears inHk\(Ct\)H\_\{k\}\(C\_\{t\}\)and dies when it merges into an older feature \(fork=0k\{=\}0\) or becomes filled \(fork=1k\{=\}1\)\. Persistent homology records these events as birth–death pairs\(bi,di\)\(b\_\{i\},d\_\{i\}\)\. Remark\.Fork=0k=0, it matters which component is considered older, because by convention, the younger component \(with the larger birth time\) is always the one that dies\. This convention ensures that each connected component has a unique birth and death time, making the pairing well\-defined\.
##### Persistence diagrams \(PDs\)\.
For eachk≥0k\\geq 0, thekk\-dimensional persistence diagram is the multiset
Dgmk=\{\(bi,di\)∈ℝ2∪\{∞\}\}\\mathrm\{Dgm\}\_\{k\}\\;=\\;\\bigl\\\{\(b\_\{i\},d\_\{i\}\)\\in\\mathbb\{R\}^\{2\}\\cup\\\{\\infty\\\}\\bigr\\\}\(6\)with one point perkk\-feature\. All points must exist above the diagonalb=db\{=\}d, as features cannot die before they are born\. Points far from the diagonal correspond to persistent features with long lifetimes,di−bid\_\{i\}\-b\_\{i\}, reflecting stable, meaningful topological structures\. Points near the diagonal have short lifetimes and are typically sensitive to small perturbations, representing noise rather than significant structures\. Points with death threshold infinity, or features that persisted through the filtration, are nonnegligible\. Persistence diagrams are stable: small perturbations in the input function result in only small changes in the diagrams under the bottleneck distance\[[10](https://arxiv.org/html/2606.26204#bib.bib15)\]\.
## 3Methods
### 3\.1Preprocessing
ForSentinel\-1, VV and VH polarization channels were initially supplied in linear units after radiometric calibration and range Doppler terrain correction\. Valid \(positive, finite\) pixels were converted into deciBel \(dB\) scale using a logarithmic transformation,
σdB0=10⋅log10\(max\(σ0,ϵ\)\)\\sigma^\{0\}\_\{\\mathrm\{dB\}\}=10\\cdot\\log\_\{10\}\\left\(\\max\(\\sigma^\{0\},\\epsilon\)\\right\)\(7\)withϵ=10−6\\epsilon=10^\{\-6\}to avoid numerical underflow\. Then, a log\-scaled grayscale map was generated per acquisition by aggregating both channels as
Igray=10⋅log10\(VV\+VH\)I\_\{\\mathrm\{gray\}\}=10\\cdot\\log\_\{10\}\(VV\+VH\)\(8\)Acquisitions with missing bands, insufficient valid pixels \(
<75%<75\\%\), or negligible intensity standard deviation \(
<0\.001<0\.001\) were discarded\.
ForSentinel\-2, images had level 2A atmospheric correction\. We first radiometrically scaled stacks by dividing all values by
10,00010\{,\}000yielding normalized surface reflectance values in the range
\[0,1\]\[0,1\]\. To ensure geometric consistency, each band was resampled by bilinear interpolation to match the spatial resolution of the reference band \(B03 or Green, 10m size\)\. Then, a grayscale map on the Normalized Difference Water Index \(NDWI\)\[[12](https://arxiv.org/html/2606.26204#bib.bib16)\]was generated using the Green \(B03\) and Near\-Infrared \(B08\) bands
NDWI=B03−B08B03\+B08\+ϵNDWI=\\frac\{B03\-B08\}\{B03\+B08\+\\epsilon\}\(9\)clipped
\[−1,1\]\[\-1,1\]and negated for watered regions to have lower intensity values \(as consistent with the S1 grayscale maps\)\. Acquisitions with missing bands, insufficient valid pixels \(
<75%<75\\%\), or negligible intensity standard deviation \(
<0\.001<0\.001\) were discarded\.
Stacked bands were saved and resized to 120x120 for transfer CNN learning, and grayscale maps were used for topological operations\. Following the pretrained models’ convention, we dropped the 60m resolution bands, B01 and B09 for S2 acquisitions, saving a 10\-channel stack\. After preprocessing, there were 3418 S1 images, 1325 of which were flooded, and 1983 S2 images, 446 of which were flooded\. 192 out of the 335 sequences contained a flood event\. We used the given split of 267 train and 68 test sequences\. It is important to note that, due to preprocessing, when training S1\-only models, some sequences had zero valid S1 acquisitions yielding 250 train and 62 test sequences\. On average, there were 10\.20 S1 acquisitions per sequence and 5\.92 S2 acquisitions per sequence\. Preprocessing was done using the RasterIO\[[13](https://arxiv.org/html/2606.26204#bib.bib17)\]Python library\.
### 3\.2Gaussian Embedding of Persistence Diagrams
As part of the intended raw nature of the dataset, optical images sometime contain cloud cover and have variations in atmospheric conditions across days, while SAR images contain speckle and geometric distortions due to incidence angle variety\. Thus, when using raw persistence diagrams, equivalent representations \(Betti curves, persistence barcodes\), or scalar summaries \(amplitude, entropy, Wasserstein distances\), our topological features were not able to converge when training and did not carry meaning across different days\. We introduce a lightweight Gaussian embedding of persistence diagrams that allows convergence and fast training\.
#### Vectorizations of persistence diagrams
Persistence diagrams are finite multisets in the birth–death plane and do not naturally inhabit a Hilbert space\. Many learning algorithms, however, require fixed\-length, differentiable, and stable features\. Several vectorization schemes have been proposed to bridge this gap\. Persistence landscapes map PDs to functional representations that can be discretized on a grid\[[4](https://arxiv.org/html/2606.26204#bib.bib18)\]\. Persistence images smooth PDs with Gaussian kernels and integrate the resulting surface over pixels to obtain a finite\-dimensional vector\[[1](https://arxiv.org/html/2606.26204#bib.bib19)\]\. Kernel methods operate directly on diagrams, including the persistence scale\-space kernel\[[25](https://arxiv.org/html/2606.26204#bib.bib20)\], the persistence\-weighted Gaussian kernel\[[18](https://arxiv.org/html/2606.26204#bib.bib21)\], and the sliced Wasserstein kernel\[[6](https://arxiv.org/html/2606.26204#bib.bib22)\]\. All of these approaches build on the fundamental stability of persistence diagrams under perturbations\[[10](https://arxiv.org/html/2606.26204#bib.bib15)\], and Gaussian or heat\-kernel\-based embeddings, in particular, inherit Lipschitz\-type stability with respect to bottleneck and Wasserstein distances\.
##### The embedding
Given a PDD=\{\(bi,di\)\}D=\\\{\(b\_\{i\},d\_\{i\}\)\\\}with lifetimesℓi=max\(di−bi,0\)\\ell\_\{i\}=\\max\(d\_\{i\}\-b\_\{i\},0\), for each homology dimension,kk, and modality,mm, we define a grid ofG×GG\\times Gcenters𝒞k\(m\)=\{cj\}j=1G2\\mathcal\{C\}^\{\(m\)\}\_\{k\}=\\\{c\_\{j\}\\\}\_\{j=1\}^\{G^\{2\}\}by empirical birth and death quantiles computed on the training split\. With a bandwidthσk\(m\)\>0\\sigma^\{\(m\)\}\_\{k\}\>0estimated from pooled birth and death scales, we define
ϕj\(m,k\)\(D\)=∑\(bi,di\)∈Dkℓiexp\(−12\(σk\(m\)\)2‖\(bi,di\)−cj‖22\),Φ\(m,k\)\(D\)=\(ϕj\(m,k\)\)j=1G2\.\\phi^\{\(m,k\)\}\_\{j\}\(D\)=\\sum\_\{\(b\_\{i\},d\_\{i\}\)\\in D\_\{k\}\}\\\!\\ell\_\{i\}\\,\\exp\\\!\\Big\(\-\\tfrac\{1\}\{2\(\\sigma^\{\(m\)\}\_\{k\}\)^\{2\}\}\\,\\\|\(b\_\{i\},d\_\{i\}\)\-c\_\{j\}\\\|\_\{2\}^\{2\}\\Big\),\\quad\\Phi^\{\(m,k\)\}\(D\)=\(\\phi^\{\(m,k\)\}\_\{j\}\)\_\{j=1\}^\{G^\{2\}\}\.We concatenate
Φ\\Phifor
k=0k=0and
k=1k=1to form a vector of size
2G22G^\{2\}\. We trim and only keep the top points by lifetime per
kkto reduce cost and only retain the most important features\.
This construction is equivalent to persistence images\[[1](https://arxiv.org/html/2606.26204#bib.bib19)\]but uses quantile\-adaptive grids to help mitigate scale shifts\. As with Gaussian heat\-based summaries\[[25](https://arxiv.org/html/2606.26204#bib.bib20)\], the map is differentiable and inherits Lipschitz\-type stability from PD stability\[[10](https://arxiv.org/html/2606.26204#bib.bib15)\]\. We avoid kernels such as\[[25](https://arxiv.org/html/2606.26204#bib.bib20),[18](https://arxiv.org/html/2606.26204#bib.bib21),[6](https://arxiv.org/html/2606.26204#bib.bib22)\]because they are contingent on a pairwise similarity assumption across the whole dataset: as mentioned before, frames showing water aren’t necessarily flooded, so an explicit vectorization is better for the change detection task\.
##### Practical procedure
For each homology dimensionk∈\{0,1\}k\\in\\\{0,1\\\}and modalitymm, we collect births and deaths from the training split,𝒯\\mathcal\{T\}, and form aG×GG\\times Ggrid of centers𝒞k\(m\)=\{cj\(m,k\)\}j=1G2\\mathcal\{C\}^\{\(m\)\}\_\{k\}=\\\{c^\{\(m,k\)\}\_\{j\}\\\}\_\{j=1\}^\{G^\{2\}\}using empirical quantiles along birth and death\. A bandwidth,σk\(m\)\\sigma^\{\(m\)\}\_\{k\}, is set from pooled birth and death scales on𝒯\\mathcal\{T\}\. Given a diagram,DD, with lifetimesℓi=max\(di−bi,0\)\\ell\_\{i\}=\\max\(d\_\{i\}\-b\_\{i\},0\), we compute for each grid center
ϕj\(m,k\)\(D\)=∑\(bi,di\)∈Dkℓiexp\(−‖\(bi,di\)−cj\(m,k\)‖222\(σk\(m\)\)2\),\\phi^\{\(m,k\)\}\_\{j\}\(D\)=\\sum\_\{\(b\_\{i\},d\_\{i\}\)\\in D\_\{k\}\}\\\!\\ell\_\{i\}\\,\\exp\\\!\\Big\(\-\\tfrac\{\\\|\(b\_\{i\},d\_\{i\}\)\-c^\{\(m,k\)\}\_\{j\}\\\|\_\{2\}^\{2\}\}\{2\(\\sigma^\{\(m\)\}\_\{k\}\)^\{2\}\}\\Big\),assembleΦ\(m,k\)\(D\)=\(ϕj\(m,k\)\)j=1G2\\Phi^\{\(m,k\)\}\(D\)=\(\\phi^\{\(m,k\)\}\_\{j\}\)\_\{j=1\}^\{G^\{2\}\}, and concatenatek=0,1k=0,1to obtain a2G22G^\{2\}\-dimensional vector\. In practice, we keep the top points by lifetime perkkto reduce cost\. Centers and bandwidths are fitted once on𝒯\\mathcal\{T\}and reused for validation and test\.
### 3\.3Feature Extraction and Training
Figure 1:Diagram of ResNet50\-GRU model, Topological \(Topo\)\-GRU model and Joint \(Fusion\-GRU\) model\.We propose and evaluate three sequence models that map variable\-length sequences to per\-frame flood probabilities: \(i\) Topological \(Topo\)\-GRU, \(ii\) ResNet50\-GRU, and \(iii\) Joint \(Fusion\-GRU\) Model, as in Figure[1](https://arxiv.org/html/2606.26204#S3.F1)\. All pipelines are implemented in Python with PyTorch\[[22](https://arxiv.org/html/2606.26204#bib.bib23)\], and are trained three times: on S1 data only, S2 data only, and the dual dataset that merges S1 and S2 as input\.
#### Topological Features Only \(Topo\-GRU\)
##### Feature extraction \(sublevel\-set cubical filtration→\\rightarrowPDs→\\rightarrowGaussian embedding, single modality\)\.
Given grayscale frames from modality S1 or S2, we construct PDs as in Section 2\.2 and retain the topK=200K\{=\}200points per homology dimension by lifetime\. Each diagram is mapped to a fixed\-length vector via the Gaussian grid embedding defined as:
Φ\(D\)=concat\(Φ\(0\)\(D\),Φ\(1\)\(D\)\),\\Phi\(D\)\\;=\\;\\operatorname\{concat\}\\big\(\\Phi^\{\(0\)\}\(D\),\\,\\Phi^\{\(1\)\}\(D\)\\big\),\(10\)withϕj\(k\)\\phi^\{\(k\)\}\_\{j\}computed from the centers and bandwidths fitted once on𝒯\\mathcal\{T\}and reused unchanged for validation and test\. We useG=10G=10andλ=0\.1\\lambda=0\.1yielding output dimension2G2=2002G^\{2\}=200\. PDs are computed with CubicalRipser\[[17](https://arxiv.org/html/2606.26204#bib.bib14)\], and parameters are maintained separately for S1 and S2\.
##### Model\.
A GRU with hidden sizeH=256H\{=\}256consumesΦ\(D\)\\Phi\(D\)at each time step\. Unlike previous work that used a binary classification head, our last layer \(size one\) produces a per\-step probability\. Specifically, the scalar logit,ztz\_\{t\}, is an affine transformation of the hidden state, and a sigmoid activation,σ\(zt\)\\sigma\(z\_\{t\}\), returns the frame\-level flood probability\.
#### Convolutional Features Only \(ResNet50\-GRU\)
##### Feature extraction\.
As per suggestion from the dataset authors\[[24](https://arxiv.org/html/2606.26204#bib.bib4)\], we initialize from ResNet\-50 encoders pretrained on BigEarthNet\[[14](https://arxiv.org/html/2606.26204#bib.bib24),[9](https://arxiv.org/html/2606.26204#bib.bib25)\]S1 and S2 land classification datasets and discard any task\-specific heads from pretraining\. The first convolution was already modified to accept 2 or 10 channels\. Given a2×120×1202\\times 120\\times 120or10×120×12010\\times 120\\times 120frame, the encoder applies global average pooling to the final feature maps producing a 2048\-dimensional descriptor,ψt\\psi\_\{t\}\.
##### Model\.
The GRU head structure is the same as above for Topo\-GRU\. For the dual dataset, a 16\-dimensional, learned embedding representing the modality is appended toψt\\psi\_\{t\}before the GRU, resulting in vector size20642064\. The encoder is unfrozen during training\.
#### Fusion of Features \(Fusion\-GRU\)
##### Feature extraction \(late concatenation\)\.
For each frame, obtain the topological descriptor,ϕ\(D\)\\phi\(D\), and the ResNet\-50 descriptor,ψt\\psi\_\{t\}, by passing through the trained GRU encoder for the frame’s modality yielding a 2248\-dimensional descriptor\. The encoder is frozen during this process, and only the GRU is trained\.
##### Model\.
The GRU head structure is the same as for Topo\-GRU\. For the dual dataset, a 16\-dimensional, learned embedding representing the modality is appended toΦ\\Phibefore the GRU resulting in vector size22642264\.
#### Common training protocol
##### Training\.
We use class\-weighted logistic loss
ℒ=−1N∑i=1N\[w\+yilogσ\(zi\)\+\(1−yi\)log\(1−σ\(zi\)\)\],w\+=1−ππ\\mathcal\{L\}=\-\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\Big\[w\_\{\+\}\\,y\_\{i\}\\log\\sigma\(z\_\{i\}\)\+\(1\-y\_\{i\}\)\\log\\big\(1\-\\sigma\(z\_\{i\}\)\\big\)\\Big\],\\qquad w\_\{\+\}=\\frac\{1\-\\pi\}\{\\pi\}\(11\)whereπ\\piis the positive rate over valid training frames,yiy\_\{i\}is the true label, andσ\(zi\)\\sigma\(z\_\{i\}\)is the output probability\. The classifier bias is initialized tologπ1−π\\log\\frac\{\\pi\}\{1\-\\pi\}\. All models are trained with the Adam optimizer, learning rate0\.0010\.001, batch size88, and a maximum of200200epochs\. To handle varying sequence length when batching, sequences are padded to the longest length in the batch; masks exclude padding from both the loss and all evaluation metrics\.
##### Evaluation Metrics\.
During training, model performance is assessed using standard classification metrics derived from the confusion matrix using probability thresholdτ=0\.5\\tau=0\.5: true positives \(TP\), true negatives \(TN\), false positives \(FP\), and false negatives \(FN\)\. Standard evaluation metrics are listed:
- •Accuracy:Acc=TP\+TNTP\+TN\+FP\+FN\\displaystyle Acc=\\frac\{TP\+TN\}\{TP\+TN\+FP\+FN\}
- •Precision:P=TPTP\+FP\\displaystyle P=\\frac\{TP\}\{TP\+FP\}
- •Recall:R=TPTP\+FN\\displaystyle R=\\frac\{TP\}\{TP\+FN\}
- •F1\-score:F1=2PRP\+R\\displaystyle F\_\{1\}=\\frac\{2PR\}\{P\+R\}
- •Fβ\-score:Fβ=\(1\+β2\)PRβ2P\+R\\displaystyle F\_\{\\beta\}=\\frac\{\(1\+\\beta^\{2\}\)PR\}\{\\beta^\{2\}P\+R\}
##### Model selection and early stopping\.
Accuracy by itself is not an adequate metric for assessing deployability, particularly when using imbalanced datasets where non\-flooded frames outnumber flooded ones\. Given class imbalance, we do not select best training models on accuracy, as done in Rambour\[[24](https://arxiv.org/html/2606.26204#bib.bib4)\]\. Instead, we monitor a recall\-tiltedFβF\_\{\\beta\}withβ=2\\beta=\\sqrt\{2\}, which upweights recall twofold relative to precision\. In other words, we consider missing a flood twice as bad as raising a false alarm\. Early stopping halts training after 20 epochs without validationFβF\_\{\\beta\}improvement\.
##### Model deployment threshold\.
Post\-training, we sweep the positive decision thresholdτ∈\[0,1\]\\tau\\in\[0,1\]for probabilities and chooseτ⋆\\tau^\{\\star\}for maximal recall subject to precision≥0\.90\\geq 0\.90\. If noτ\\tausatisfies this constraint, we fall back to the threshold with maximumF1F\_\{1\}\.
## 4Results and Discussion
### 4\.1Results
Table 1:Comparison of GRU model performance on S1, S2, and dual\-modality tasks\. We reportFβF\_\{\\beta\},F1F\_\{1\}\-score, accuracy, previously reported baseline accuracy by\[[24](https://arxiv.org/html/2606.26204#bib.bib4)\], and best recall achieved at90%90\\%precision\.Table 2:Comparison of GRU models on S1, S2, and dual\-modality tasks with TP, FP, TN and FN\.
### 4\.2Discussion
The ResNet\-50 GRUs transfer\-learned from BigEarthNet encoders achieve benchmark performance gains of 8\.3%, 5\.8%, and 1\.7% for the S1, S2, and Dual tasks, respectively\. The largest improvement from exposure to the larger dataset for pretraining occurs with S1 SAR data, as unlike optical data, SAR is not conventionally used as input for CNNs\. Compared to ResNet50\-GRU models, which achieve 95\.8% accuracy on S1 and 98\.8% on S2, our Topo\-GRUs perform slightly worse as expected, but still attain solid results of 91\.4% and 95\.8%, respectively, while using only about one\-tenth of the feature size\. Our Fusion\-GRU Dual model achieves the best results, corroborating the statement that topological and convolutional features complement each other\.
These results mark the first application of TDA directly to images for flood detection, and the first use of persistent homology on SAR imagery showing promise for further exploration\. Future directions include investigating more advanced feature\-combination strategies such as attention mechanisms, integrating persistence diagrams or persistence images into CNNs\[[23](https://arxiv.org/html/2606.26204#bib.bib10)\], and allowing the topological feature embeddings to remain unfrozen and learnable\. For example, Perslay\[[5](https://arxiv.org/html/2606.26204#bib.bib26)\]enables optimization of centers and standard deviations, and alternative radial basis functions for embeddings could also be explored\[[16](https://arxiv.org/html/2606.26204#bib.bib27)\]\.
###### Acknowledgements\.
This work was funded by the Office of Naval Research under the Naval Research Laboratory \(NRL\) 6\.1 Base Program and the NRL Science and Engineering Apprenticeship Program \(SEAP\)\. DISTRIBUTION STATEMENT A: Approved for public release\. Distribution is unlimited\.
## References
- \[1\]H\. Adams, T\. Emerson, M\. Kirby, R\. Neville, C\. Peterson, P\. Shipman, S\. Chepushtanova, E\. Hanson, F\. Motta, and L\. Ziegelmeier\(2017\)Persistence images: a stable vector representation of persistent homology\.Journal of Machine Learning Research18\(8\),pp\. 1–35\.Cited by:[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.Px1.p1.12),[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.p1.1)\.
- \[2\]D\. Amitrano, G\. Di Martino, A\. Di Simone, and P\. Imperatore\(2024\)Flood detection with SAR: a review of techniques and datasets\.Remote Sensing16\(4\),pp\. 656\.External Links:[Document](https://dx.doi.org/10.3390/rs16040656)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[3\]B\. Bischke, P\. Helber, S\. Brugman, E\. Basar, Z\. Zhao, M\. Larson, and K\. Pogorelov\(2019\)The multimedia satellite task at MediaEval 2019\.InMediaEval Benchmarking Initiative for Multimedia Evaluation,Cited by:[§2\.1](https://arxiv.org/html/2606.26204#S2.SS1.p1.2)\.
- \[4\]P\. Bubenik\(2015\)Statistical topological data analysis using persistence landscapes\.Journal of Machine Learning Research16\(3\),pp\. 77–102\.Cited by:[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.p1.1)\.
- \[5\]M\. Carrière, F\. Chazal, Y\. Ike, T\. Lacombe, M\. Royer, and Y\. Umeda\(2020\)PersLay: a neural network layer for persistence diagrams and new graph topological signatures\.InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics,Proceedings of Machine Learning Research, Vol\.108,pp\. 2786–2796\.Cited by:[§4\.2](https://arxiv.org/html/2606.26204#S4.SS2.p2.1)\.
- \[6\]M\. Carrière, M\. Cuturi, and S\. Oudot\(2017\)Sliced wasserstein kernel for persistence diagrams\.InProceedings of the 34th International Conference on Machine Learning,Proceedings of Machine Learning Research,pp\. 664–673\.Cited by:[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.Px1.p1.12),[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.p1.1)\.
- \[7\]I\. Chamatidis, D\. Istrati, and N\. D\. Lagaros\(2024\)Vision transformer for flood detection using satellite images from Sentinel\-1 and Sentinel\-2\.Water16\(12\),pp\. 1670\.External Links:[Document](https://dx.doi.org/10.3390/w16121670)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[8\]K\. Cho, B\. van Merrienboer, C\. Gulcehre, D\. Bahdanau, F\. Bougares, H\. Schwenk, and Y\. Bengio\(2014\)Learning phrase representations using RNN encoder–decoder for statistical machine translation\.External Links:1406\.1078,[Link](https://arxiv.org/abs/1406.1078)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[9\]K\. N\. Clasen, L\. Hackel, T\. Burgert, G\. Sumbul, B\. Demir, and V\. Markl\(2024\)reBEN: refined BigEarthNet dataset for remote sensing image analysis\.External Links:2407\.03653,[Link](https://arxiv.org/abs/2407.03653)Cited by:[§3\.3](https://arxiv.org/html/2606.26204#S3.SS3.SSSx2.Px1.p1.3)\.
- \[10\]D\. Cohen\-Steiner, H\. Edelsbrunner, and J\. Harer\(2007\)Stability of persistence diagrams\.Discrete & Computational Geometry37\(1\),pp\. 103–120\.External Links:[Document](https://dx.doi.org/10.1007/s00454-006-1276-5)Cited by:[§2\.2](https://arxiv.org/html/2606.26204#S2.SS2.SSSx4.Px1.p1.5),[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.Px1.p1.12),[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.p1.1)\.
- \[11\]B\. Coskunuzer and C\. G\. Akcora\(2024\)Topological methods in machine learning: a tutorial for practitioners\.External Links:2409\.02901,[Link](https://arxiv.org/abs/2409.02901)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[12\]B\. Gao\(1996\)NDWI—a normalized difference water index for remote sensing of vegetation liquid water from space\.Remote Sensing of Environment58\(3\),pp\. 257–266\.External Links:[Document](https://dx.doi.org/10.1016/S0034-4257%2896%2900067-3)Cited by:[§3\.1](https://arxiv.org/html/2606.26204#S3.SS1.p1.5)\.
- \[13\]S\. Gillieset al\.\(2013–\)Rasterio: geospatial raster I/O for Python programmers\.External Links:[Link](https://rasterio.readthedocs.io/)Cited by:[§3\.1](https://arxiv.org/html/2606.26204#S3.SS1.p1.8)\.
- \[14\]L\. Hackel, K\. N\. Clasen, and B\. Demir\(2024\)ConfigILM: a general purpose configurable library for combining image and language models for visual question answering\.SoftwareX26,pp\. 101731\.External Links:[Document](https://dx.doi.org/10.1016/j.softx.2024.101731)Cited by:[§3\.3](https://arxiv.org/html/2606.26204#S3.SS3.SSSx2.Px1.p1.3)\.
- \[15\]K\. He, X\. Zhang, S\. Ren, and J\. Sun\(2015\)Deep residual learning for image recognition\.CoRRabs/1512\.03385\.External Links:1512\.03385,[Link](https://arxiv.org/abs/1512.03385)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[16\]C\. D\. Hofer, R\. Kwitt, and M\. Niethammer\(2019\)Learning representations of persistence barcodes\.Journal of Machine Learning Research20\(126\),pp\. 1–45\.Cited by:[§4\.2](https://arxiv.org/html/2606.26204#S4.SS2.p2.1)\.
- \[17\]S\. Kaji, T\. Sudo, and K\. Ahara\(2020\)Cubical ripser: software for computing persistent homology of image and volume data\.CoRRabs/2005\.12692\.External Links:2005\.12692,[Link](https://arxiv.org/abs/2005.12692)Cited by:[§2\.2](https://arxiv.org/html/2606.26204#S2.SS2.p1.1),[§3\.3](https://arxiv.org/html/2606.26204#S3.SS3.SSSx1.Px1.p1.6)\.
- \[18\]G\. Kusano, K\. Fukumizu, and Y\. Hiraoka\(2016\)Persistence weighted gaussian kernel for topological data analysis\.InProceedings of the 33rd International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.48,pp\. 2004–2013\.Cited by:[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.Px1.p1.12),[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.p1.1)\.
- \[19\]M\. D\. P\. Lima, G\. A\. Giraldi, and G\. F\. Miranda Junior\(2023\)Image classification using combination of topological features and neural networks\.External Links:2311\.06375,[Link](https://arxiv.org/abs/2311.06375)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[20\]B\. Mittal, J\. Vanzara, and S\. A\. Sajidha\(2024\)Navigating spatial insights: comparative exploration of deep learning algorithms for flood detection using remote sensing data\.In2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation \(IATMSI\),pp\. 1–6\.External Links:[Document](https://dx.doi.org/10.1109/IATMSI60426.2024.10502981)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[21\]A\. Moreira, P\. Prats\-Iraola, M\. Younis, G\. Krieger, I\. Hajnsek, and K\. P\. Papathanassiou\(2013\)A tutorial on synthetic aperture radar\.IEEE Geoscience and Remote Sensing Magazine1\(1\),pp\. 6–43\.External Links:[Document](https://dx.doi.org/10.1109/MGRS.2013.2248301)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.26204#S2.SS1.p1.2)\.
- \[22\]A\. Paszke, S\. Gross, F\. Massa, A\. Lerer, J\. Bradbury, G\. Chanan, T\. Killeen, Z\. Lin, N\. Gimelshein, L\. Antiga, A\. Desmaison, A\. K”opf, E\. Yang, Z\. DeVito, M\. Raison, A\. Tejani, S\. Chilamkurthy, B\. Steiner, L\. Fang, J\. Bai, and S\. Chintala\(2019\)PyTorch: an imperative style, high\-performance deep learning library\.InAdvances in Neural Information Processing Systems 32,pp\. 8024–8035\.Cited by:[§3\.3](https://arxiv.org/html/2606.26204#S3.SS3.p1.1)\.
- \[23\]Y\. Peng, H\. Wang, M\. Sonka, and D\. Z\. Chen\(2024\)PHG\-Net: persistent homology guided medical image classification\.InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision \(WACV\),pp\. 7583–7592\.Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1),[§4\.2](https://arxiv.org/html/2606.26204#S4.SS2.p2.1)\.
- \[24\]C\. Rambour, N\. Audebert, E\. Koeniguer, B\. Le Saux, M\. Crucianu, and M\. Datcu\(2020\)Flood detection in time series of optical and SAR images\.InThe International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences,Vol\.XLIII\-B2\-2020,pp\. 1343–1346\.External Links:[Document](https://dx.doi.org/10.5194/isprs-archives-XLIII-B2-2020-1343-2020)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1),[§3\.3](https://arxiv.org/html/2606.26204#S3.SS3.SSSx2.Px1.p1.3),[§3\.3](https://arxiv.org/html/2606.26204#S3.SS3.SSSx4.Px3.p1.3),[Table 1](https://arxiv.org/html/2606.26204#S4.T1)\.
- \[25\]J\. Reininghaus, S\. Huber, U\. Bauer, and R\. Kwitt\(2015\)A stable multi\-scale kernel for topological machine learning\.InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition \(CVPR\),pp\. 4741–4748\.Cited by:[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.Px1.p1.12),[§3\.2](https://arxiv.org/html/2606.26204#S3.SS2.SSSx1.p1.1)\.
- \[26\]A\. Sharma\(2025\)Improving remote sensing classification using topological data analysis and convolutional neural networks\.External Links:2507\.10381,[Link](https://arxiv.org/abs/2507.10381)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.
- \[27\]X\. X\. Zhu, D\. Tuia, L\. Mou, G\. Xia, L\. Zhang, F\. Xu, and F\. Fraundorfer\(2017\)Deep learning in remote sensing: a comprehensive review and list of resources\.IEEE Geoscience and Remote Sensing Magazine5\(4\),pp\. 8–36\.External Links:[Document](https://dx.doi.org/10.1109/MGRS.2017.2762307)Cited by:[§1](https://arxiv.org/html/2606.26204#S1.p1.1)\.Similar Articles
Spatial Support Matters: Geometry-Aware Graph Fusion for Rainfall Field Reconstruction
This paper proposes a geometry-aware multi-support heterogeneous graph neural network for fine-scale rainfall field reconstruction, which fuses observations from point gauges, path-integrated microwave links, and gridded radar/satellite data. The method reduces RMSE by 23.2% over classical interpolation on Singapore data and shows greatest gains when the field is undersampled relative to its spatial correlation length.
@BetaTomorrow: Paper: Topological Neural Operators Authors: Lennart Bastian(@lennart_bastian), Tolga Birdal(@tolga_birdal), Samuel Lev…
This paper introduces Topological Neural Operators, which lift neural operators from point-only domains to cell complexes, embedding geometry and topology to reduce the learning burden. It demonstrates that operator learning improves when geometry is not an afterthought, though the topology remains prescribed.
Physics-Informed Machine Learning for Short-Term Flood Prediction
Researchers propose a Physics-Informed Machine Learning (PIML) framework that integrates hydrological constraints into an LSTM loss function to improve short-term flood forecasting, particularly in data-scarce regimes. A 'Trend Alignment' constraint enforcing consistency between precipitation and discharge trends improves Nash-Sutcliffe Efficiency and eliminates unphysical predictions during extreme events.
Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference
This paper introduces a novel uncertainty-aware PINN framework for flood inference from SAR data, addressing 'physics shock' by dynamically relaxing physical constraints in noisy regions. Evaluated on Sen1Floods11, the method achieves a 25% improvement in IoU and provides calibrated uncertainty bounds for operational disaster response.
eCNNTO: A Highly Generalizable ConvNet for Accelerating Topology Optimization
This paper proposes eCNNTO, a CNN with residual connections to accelerate density-based topology optimization by predicting near-optimal densities from early iteration histories, achieving up to 97% reduction in iterations and strong generalization across different boundary conditions, geometries, and mesh resolutions.