LSTM-Based Detection of Structural Breaks in Property Insurance Loss Reserving: A Climate-Informed Approach

arXiv cs.LG Papers

Summary

This white paper proposes using LSTM neural networks to detect structural breaks in property insurance loss reserving caused by climate-driven catastrophes, aiming to improve accuracy by 15–20% over traditional methods like Chain Ladder.

arXiv:2606.11463v1 Announce Type: new Abstract: Accurate loss reserving is foundational to insurer solvency, yet accelerating climate driven catastrophes systematically violate the stability assumptions on which traditional actuarial methods depend. This white paper presents a research program testing whether Long Short Term Memory (LSTM) neural networks can detect and adapt to these structural breaks faster and more accurately than Chain Ladder, Bornhuetter Ferguson, and Cape Cod methods. Using 15 plus years of regulatory development triangle data from Florida and Louisiana, enriched with NOAA hurricane intensity indices and sea surface temperatures, we hypothesize a targeted improvement of 15, 20% in reserve accuracy for catastrophe exposed years, a threshold grounded both in the prior neural network reserving literature and in the formal convergence results developed here. Beyond empirical validation, we develop a theoretical framework grounding LSTM structural break detection in probabilistic terms, providing formal performance guarantees that compensate for the limited number of catastrophe events in the test period. We document the research design, methodology, expected contributions, and a candid assessment of limitations.
Original Article
View Cached Full Text

Cached at: 06/11/26, 01:47 PM

# 1. The Reserving Problem in a Climate-Volatile World
Source: [https://arxiv.org/html/2606.11463](https://arxiv.org/html/2606.11463)
WHITE PAPER⋅\\cdotAUGUST 2025

LSTM\-Based Detection of Structural Breaks in Property Insurance Loss Reserving: A Climate\-Informed Approach

Thomas Mbrice\|\|Shashwat Panigrahi Stony Brook University\|\|Department of Computer Science

Keywords: loss reserving · structural breaks · LSTM · catastrophe modeling · climate risk · actuarial science · machine learning

Executive Summary

Accurate loss reserving is foundational to insurer solvency, yet accelerating climate\-driven catastrophes systematically violate the stability assumptions on which traditional actuarial methods depend\. This white paper presents a research program testing whether Long Short\-Term Memory \(LSTM\) neural networks can detect and adapt to these structural breaks faster and more accurately than Chain Ladder, Bornhuetter\-Ferguson, and Cape Cod methods\. Using 15\-plus years of regulatory development triangle data from Florida and Louisiana, enriched with NOAA hurricane intensity indices and sea surface temperatures, we hypothesize a targeted improvement of 15–20% in reserve accuracy for catastrophe\-exposed years, a threshold grounded both in the prior neural network reserving literature and in the formal convergence results developed here\. Beyond empirical validation, we develop a theoretical framework grounding LSTM structural break detection in probabilistic terms, providing formal performance guarantees that compensate for the limited number of catastrophe events in the test period\. We document the research design, methodology, expected contributions, and a candid assessment of limitations\.

Loss reserving is one of the most consequential functions in property\-casualty insurance, directly determining an insurer’s financial stability and regulatory standing\. Traditional methods rely on the assumption that loss development patterns remain stable across accident years\. Climate change has fundamentally disrupted that stability\.

Between 2017 and 2023, multiple property insurers in Florida and Louisiana became insolvent following hurricane losses that overwhelmed both claims capacity and the actuarial models used to anticipate them\. Hurricane Ian \(2022\) generated over $50 billion in insured losses and triggered a litigation surge that extended development timelines far beyond historical norms\. Hurricane Ida \(2021\) caused $36 billion in losses and collapsed infrastructure in ways that delayed settlements for years\. Traditional methods, which rely on backward\-looking averages, were ill\-equipped to detect these pattern shifts until multiple development periods had already passed\.

### 1\.1 Four Structural Break Drivers

- •Severity shocks:Reconstruction cost inflation driven by supply chain disruptions and labor shortages\.
- •Frequency changes:Building code improvements and population migration alter risk profiles\.
- •Development pattern shifts:Litigation and Assignment of Benefits abuse extend settlement periods\.
- •Claims handling disruptions:Overwhelming claim volumes slow processing and impair pattern recognition\.

\{callout\}

Why Traditional Methods Fall Short

Chain Ladder requires 3–5 development periods to recognize regime shifts\(Shapland,[2016](https://arxiv.org/html/2606.11463#bib.bib16)\)\. Reserve errors exceeding 30% have been documented following major catastrophes\(Meyers,[2015](https://arxiv.org/html/2606.11463#bib.bib13)\)\. Neither Bornhuetter\-Ferguson nor Cape Cod incorporate exogenous climate signals\. The detection lag creates systematic underfunding precisely when reserves matter most\.

## 2\. The LSTM Opportunity

Long Short\-Term Memory \(LSTM\) networks are a specialized class of recurrent neural network engineered to maintain both short\- and long\-term memory through gated mechanisms that selectively retain or discard information\. This architecture maps directly onto the structural break problem: when historical patterns cease to predict future development, an LSTM can learn to downweight stale information and assimilate emerging signals more rapidly than any fixed averaging rule\.

### 2\.1 Key Architectural Advantages

- •Forget gatesselectively discard historical patterns when new data signals a regime shift\.
- •Input gatesgovern the rate at which new distributional information is incorporated\.
- •Bidirectional layersallow the model to learn from development context in both directions\.
- •Attention mechanismsproduce interpretable weights over development periods, identifying which quarters drove a reserve estimate\.

The attention layer is particularly significant for regulatory acceptance\. By exposing attention weights, this architecture makes visible which development periods most influenced any given reserve estimate, a form of interpretability that aligns with actuarial standards of justification\.

## 3\. Theoretical Framework: A Formal Basis for LSTM Structural Break Detection

A central challenge in this research is limited empirical data: the test period contains only four major catastrophe events\. To compensate for this constraint, we develop a probabilistic theoretical framework that formally characterizes when and why LSTM networks outperform static linear estimators in the presence of structural breaks\. The proofs below establish conditions under which LSTM\-based reserves converge more rapidly to the true ultimate loss after a break, independent of the size of the empirical sample\.

### 3\.1 Setup and Notation

Let\{Lt\}t=1T\\\{L\_\{t\}\\\}\_\{t=1\}^\{T\}denote the sequence of cumulative paid losses for a given accident year at development periodst=1,…,Tt=1,\\ldots,T, whereTTis the ultimate development age\. LetθA\\theta\_\{A\}andθB\\theta\_\{B\}denote the pre\- and post\-break true loss development parameters, and letθt∈\{θA,θB\}\\theta\_\{t\}\\in\\\{\\theta\_\{A\},\\theta\_\{B\}\\\}denote the parameter governing the process at periodtt\.

###### Definition 3\.1\(Structural Break\)\.

A*structural break*at timeτ∈\{1,…,T\}\\tau\\in\\\{1,\\ldots,T\\\}is an event such that the data\-generating process satisfies

Lt∣θt=\{f​\(Lt−1,…,L1;θA\)t<τf​\(Lt−1,…,L1;θB\)t≥τL\_\{t\}\\mid\\theta\_\{t\}=\\begin\{cases\}f\(L\_\{t\-1\},\\ldots,L\_\{1\};\\,\\theta\_\{A\}\)&t<\\tau\\\\ f\(L\_\{t\-1\},\\ldots,L\_\{1\};\\,\\theta\_\{B\}\)&t\\geq\\tau\\end\{cases\}whereθA≠θB\\theta\_\{A\}\\neq\\theta\_\{B\}andffis a measurable development function\. The magnitude of the break isΔ​θ=‖θB−θA‖2\\Delta\\theta=\\\|\\theta\_\{B\}\-\\theta\_\{A\}\\\|\_\{2\}\.

###### Definition 3\.2\(Chain Ladder Estimator\)\.

The Chain Ladder estimator ofLTL\_\{T\}observed through periodttis

L^TCL​\(t\)=Lt⋅∏s=tT−1f^s,f^s=∑i=1nLi,s\+1∑i=1nLi,s,\\widehat\{L\}\_\{T\}^\{\\,\\mathrm\{CL\}\}\(t\)=L\_\{t\}\\cdot\\prod\_\{s=t\}^\{T\-1\}\\hat\{f\}\_\{s\},\\qquad\\hat\{f\}\_\{s\}=\\frac\{\\sum\_\{i=1\}^\{n\}L\_\{i,s\+1\}\}\{\\sum\_\{i=1\}^\{n\}L\_\{i,s\}\},wherennis the number of accident years andLi,sL\_\{i,s\}is cumulative losses for accident yeariiat agess\.

###### Definition 3\.3\(LSTM Reserve Estimator\)\.

An LSTM reserve estimator is a parametric mapΦ𝐖:ℝt×d→ℝ\\Phi\_\{\\mathbf\{W\}\}:\\mathbb\{R\}^\{t\\times d\}\\to\\mathbb\{R\}from thedd\-dimensional feature sequence\(x1,…,xt\)\(x\_\{1\},\\ldots,x\_\{t\}\)to a scalar ultimate loss estimate, where𝐖\\mathbf\{W\}are learned weights\. The hidden state update at each step is governed by the standard LSTM gating equations:

ft\\displaystyle f\_\{t\}=σ​\(Wf​xt\+Uf​ht−1\+bf\)\\displaystyle=\\sigma\(W\_\{f\}x\_\{t\}\+U\_\{f\}h\_\{t\-1\}\+b\_\{f\}\)\(1\)it\\displaystyle i\_\{t\}=σ​\(Wi​xt\+Ui​ht−1\+bi\)\\displaystyle=\\sigma\(W\_\{i\}x\_\{t\}\+U\_\{i\}h\_\{t\-1\}\+b\_\{i\}\)\(2\)c~t\\displaystyle\\tilde\{c\}\_\{t\}=tanh⁡\(Wc​xt\+Uc​ht−1\+bc\)\\displaystyle=\\tanh\(W\_\{c\}x\_\{t\}\+U\_\{c\}h\_\{t\-1\}\+b\_\{c\}\)\(3\)ct\\displaystyle c\_\{t\}=ft⊙ct−1\+it⊙c~t\\displaystyle=f\_\{t\}\\odot c\_\{t\-1\}\+i\_\{t\}\\odot\\tilde\{c\}\_\{t\}\(4\)ot\\displaystyle o\_\{t\}=σ​\(Wo​xt\+Uo​ht−1\+bo\)\\displaystyle=\\sigma\(W\_\{o\}x\_\{t\}\+U\_\{o\}h\_\{t\-1\}\+b\_\{o\}\)\(5\)ht\\displaystyle h\_\{t\}=ot⊙tanh⁡\(ct\)\\displaystyle=o\_\{t\}\\odot\\tanh\(c\_\{t\}\)\(6\)whereσ\\sigmadenotes the sigmoid function and⊙\\odotdenotes element\-wise multiplication\.

### 3\.2 Core Theoretical Results

We now establish three results: \(i\) that Chain Ladder convergence is delayed after a structural break, \(ii\) that an LSTM with sufficient capacity can represent the post\-break distribution arbitrarily well, and \(iii\) that under a regime\-covering pre\-training condition, LSTM detection speed dominates Chain Ladder detection speed by at leastk−1k\-1periods\.

###### Assumption 3\.4\(Post\-break stationarity\)\.

Following a break atτ\\tau, the post\-break process\{Lt\}t≥τ\\\{L\_\{t\}\\\}\_\{t\\geq\\tau\}is stationary and ergodic under parametersθB\\theta\_\{B\}, with finite varianceσB2<∞\\sigma\_\{B\}^\{2\}<\\infty\.

###### Assumption 3\.5\(Chain Ladder averaging window\)\.

The Chain Ladder estimator uses a volume\-weighted average over the most recentkkaccident years, withk≥2k\\geq 2\.

###### Assumption 3\.6\(Pre\-break loss level dominance\)\.

At the development age of interestss, the expected cumulative losses under the pre\-break regime are at least as large as under the post\-break regime:μsA≥μsB\>0\\mu\_\{s\}^\{A\}\\geq\\mu\_\{s\}^\{B\}\>0\.

###### Lemma 3\.8\(Bias of Chain Ladder after a break\)\.

Under Assumptions[3\.4](https://arxiv.org/html/2606.11463#S3.Thmtheorem4),[3\.5](https://arxiv.org/html/2606.11463#S3.Thmtheorem5), and[3\.6](https://arxiv.org/html/2606.11463#S3.Thmtheorem6), the bias of the Chain Ladder age\-to\-age factor estimatef^s\\hat\{f\}\_\{s\}for development periods≥τs\\geq\\tausatisfies

\|𝔼​\[f^s\]−fsB\|≥\(k−m\)k⋅\|fsA−fsB\|\\bigl\|\\mathbb\{E\}\[\\hat\{f\}\_\{s\}\]\-f\_\{s\}^\{B\}\\bigr\|\\geq\\frac\{\(k\-m\)\}\{k\}\\cdot\|f\_\{s\}^\{A\}\-f\_\{s\}^\{B\}\|form<km<kpost\-break accident years in the averaging window, wherefsAf\_\{s\}^\{A\}andfsBf\_\{s\}^\{B\}are the pre\- and post\-break true development factors\.

###### Proof\.

Let there bemmpost\-break accident years andk−mk\-mpre\-break accident years contributing to the volume\-weighted average at development agess\. The expected value of the volume\-weighted factor estimator is

𝔼​\[f^s\]=m​μsB⋅fsB\+\(k−m\)​μsA⋅fsAm​μsB\+\(k−m\)​μsA\.\\mathbb\{E\}\[\\hat\{f\}\_\{s\}\]=\\frac\{m\\,\\mu\_\{s\}^\{B\}\\cdot f\_\{s\}^\{B\}\+\(k\-m\)\\,\\mu\_\{s\}^\{A\}\\cdot f\_\{s\}^\{A\}\}\{m\\,\\mu\_\{s\}^\{B\}\+\(k\-m\)\\,\\mu\_\{s\}^\{A\}\}\.Letρ=m​μsB/\[m​μsB\+\(k−m\)​μsA\]∈\(0,1\)\\rho=m\\mu\_\{s\}^\{B\}/\[m\\mu\_\{s\}^\{B\}\+\(k\-m\)\\mu\_\{s\}^\{A\}\]\\in\(0,1\)denote the weight on post\-break experience\. Then

𝔼​\[f^s\]=ρ​fsB\+\(1−ρ\)​fsA,\\mathbb\{E\}\[\\hat\{f\}\_\{s\}\]=\\rho\\,f\_\{s\}^\{B\}\+\(1\-\\rho\)\\,f\_\{s\}^\{A\},and the bias is

\|𝔼​\[f^s\]−fsB\|=\(1−ρ\)​\|fsA−fsB\|\.\\bigl\|\\mathbb\{E\}\[\\hat\{f\}\_\{s\}\]\-f\_\{s\}^\{B\}\\bigr\|=\(1\-\\rho\)\\,\|f\_\{s\}^\{A\}\-f\_\{s\}^\{B\}\|\.We bound1−ρ1\-\\rhofrom below\. By Assumption[3\.6](https://arxiv.org/html/2606.11463#S3.Thmtheorem6),μsB≤μsA\\mu\_\{s\}^\{B\}\\leq\\mu\_\{s\}^\{A\}, som​μsB≤m​μsAm\\,\\mu\_\{s\}^\{B\}\\leq m\\,\\mu\_\{s\}^\{A\}, giving

m​μsB\+\(k−m\)​μsA≤k​μsA\.m\\,\\mu\_\{s\}^\{B\}\+\(k\-m\)\\,\\mu\_\{s\}^\{A\}\\;\\leq\\;k\\,\\mu\_\{s\}^\{A\}\.Therefore

1−ρ=\(k−m\)​μsAm​μsB\+\(k−m\)​μsA≥\(k−m\)​μsAk​μsA=k−mk,1\-\\rho=\\frac\{\(k\-m\)\\,\\mu\_\{s\}^\{A\}\}\{m\\,\\mu\_\{s\}^\{B\}\+\(k\-m\)\\,\\mu\_\{s\}^\{A\}\}\\geq\\frac\{\(k\-m\)\\,\\mu\_\{s\}^\{A\}\}\{k\\,\\mu\_\{s\}^\{A\}\}=\\frac\{k\-m\}\{k\},and combining gives the stated bound\. ∎

###### Theorem 3\.10\(LSTM Universal Approximation for Sequential Data\)\.

Letℱ\\mathcal\{F\}be the class of measurable, bounded functions on compact𝒳⊂ℝt×d\\mathcal\{X\}\\subset\\mathbb\{R\}^\{t\\times d\}\. For anyϵ\>0\\epsilon\>0and any target functiong∈ℱg\\in\\mathcal\{F\}representing the post\-break conditional expectationg​\(x1:t\)=𝔼​\[LT∣x1:t;θB\]g\(x\_\{1:t\}\)=\\mathbb\{E\}\[L\_\{T\}\\mid x\_\{1:t\};\\theta\_\{B\}\], there exists an LSTM with weights𝐖\\mathbf\{W\}and hidden dimensionHHsufficiently large such that

supx1:t∈𝒳\|Φ𝐖​\(x1:t\)−g​\(x1:t\)\|<ϵ\.\\sup\_\{x\_\{1:t\}\\in\\mathcal\{X\}\}\\bigl\|\\Phi\_\{\\mathbf\{W\}\}\(x\_\{1:t\}\)\-g\(x\_\{1:t\}\)\\bigr\|<\\epsilon\.

###### Proof\.

The result follows from the universal approximation theorem for recurrent neural networks established bySchäfer and Zimmermann\([2006](https://arxiv.org/html/2606.11463#bib.bib15)\)\. Specifically, becauseσ\\sigma\(sigmoid\) andtanh\\tanhare continuous, non\-polynomial activation functions, the LSTM hidden state transition in equations \([1](https://arxiv.org/html/2606.11463#S3.E1)\)–\([6](https://arxiv.org/html/2606.11463#S3.E6)\) defines a continuous map on𝒳\\mathcal\{X\}\. By the Stone\-Weierstrass theorem, the class of functions representable by a sufficiently wide LSTM is dense inC​\(𝒳\)C\(\\mathcal\{X\}\)under the sup\-norm\. Sinceg∈ℱ⊂C​\(𝒳\)g\\in\\mathcal\{F\}\\subset C\(\\mathcal\{X\}\)by assumption and𝒳\\mathcal\{X\}is compact, the approximation error can be made arbitrarily small by increasingHH\. ∎

###### Assumption 3\.11\(Regime\-covering pre\-training distribution\)\.

The LSTM is trained on a data distribution𝒟train\\mathcal\{D\}\_\{\\mathrm\{train\}\}that covers regime\-BB\-like dynamics\. Formally, there exists a subset𝒟B⊆𝒟train\\mathcal\{D\}\_\{B\}\\subseteq\\mathcal\{D\}\_\{\\mathrm\{train\}\}such that the marginal distribution of\(x1:t,LT\)\(x\_\{1:t\},L\_\{T\}\)in𝒟B\\mathcal\{D\}\_\{B\}is absolutely continuous with respect to the post\-break data\-generating process underθB\\theta\_\{B\}\. This condition can be satisfied by \(i\) including climate features𝒞\\mathcal\{C\}that correlate with the post\-break regime \(see Corollary[3\.15](https://arxiv.org/html/2606.11463#S3.Thmtheorem15)\), or \(ii\) transfer learning from other catastrophe events whose development dynamics overlap withθB\\theta\_\{B\}\.

###### Theorem 3\.13\(Faster Post\-Break Convergence of LSTM vs\. Chain Ladder\)\.

Under Assumptions[3\.4](https://arxiv.org/html/2606.11463#S3.Thmtheorem4),[3\.5](https://arxiv.org/html/2606.11463#S3.Thmtheorem5),[3\.6](https://arxiv.org/html/2606.11463#S3.Thmtheorem6), and[3\.11](https://arxiv.org/html/2606.11463#S3.Thmtheorem11), letτ\\taube a structural break\. Define the detection time of methodMMas

TdetM=min⁡\{t≥τ:\|L^TM​\(t\)−𝔼​\[LT;θB\]\|<δ\}T\_\{\\det\}^\{M\}=\\min\\Bigl\\\{t\\geq\\tau:\\bigl\|\\widehat\{L\}\_\{T\}^\{M\}\(t\)\-\\mathbb\{E\}\[L\_\{T\};\\theta\_\{B\}\]\\bigr\|<\\delta\\Bigr\\\}for a toleranceδ\>0\\delta\>0\. Then

𝔼​\[TdetCL\]−𝔼​\[TdetLSTM\]≥k−1\.\\mathbb\{E\}\\bigl\[T\_\{\\det\}^\{\\mathrm\{CL\}\}\\bigr\]\-\\mathbb\{E\}\\bigl\[T\_\{\\det\}^\{\\mathrm\{LSTM\}\}\\bigr\]\\;\\geq\\;k\-1\.

###### Proof\.

Lower bound for Chain Ladder\.By Lemma[3\.8](https://arxiv.org/html/2606.11463#S3.Thmtheorem8)and Remark[3\.9](https://arxiv.org/html/2606.11463#S3.Thmtheorem9), for anym<km<kpost\-break accident years in the averaging window the factor estimate satisfies\|𝔼​\[f^s\]−fsB\|\>0\|\\mathbb\{E\}\[\\hat\{f\}\_\{s\}\]\-f\_\{s\}^\{B\}\|\>0\. If\|fsA−fsB\|≥c\>0\|f\_\{s\}^\{A\}\-f\_\{s\}^\{B\}\|\\geq c\>0for at least one agess, the bias propagates multiplicatively across development periods and the reserve estimateL^TCL\\widehat\{L\}\_\{T\}^\{\\mathrm\{CL\}\}cannot satisfy the detection criterion\|L^TCL−𝔼​\[LT;θB\]\|<δ\|\\widehat\{L\}\_\{T\}^\{\\mathrm\{CL\}\}\-\\mathbb\{E\}\[L\_\{T\};\\theta\_\{B\}\]\|<\\deltauntilm=km=k\. This requires at leastkkpost\-break accident years in the window, so

𝔼​\[TdetCL\]≥τ\+k\.\\mathbb\{E\}\\bigl\[T\_\{\\det\}^\{\\mathrm\{CL\}\}\\bigr\]\\geq\\tau\+k\.
Upper bound for LSTM\.By Assumption[3\.11](https://arxiv.org/html/2606.11463#S3.Thmtheorem11), the trained LSTM has been exposed to regime\-BB\-like dynamics during training, so its weights𝐖\\mathbf\{W\}constitute a finite\-sample approximation to the universal approximator guaranteed by Theorem[3\.10](https://arxiv.org/html/2606.11463#S3.Thmtheorem10)\. Letϵn\\epsilon\_\{n\}denote the approximation error of the trained LSTM, which converges to zero as the sizennof the regime\-BB\-like component𝒟B\\mathcal\{D\}\_\{B\}grows; we assumeϵn<δ/2\\epsilon\_\{n\}<\\delta/2fornnsufficiently large\.

Upon observing the first post\-break development period att=τt=\\tau, the LSTM receives featuresxτx\_\{\\tau\}drawn from the post\-break distribution\. By Assumption[3\.11](https://arxiv.org/html/2606.11463#S3.Thmtheorem11), the gating weights have been tuned to recognize regime\-BBinputs: the forget gatefτf\_\{\\tau\}can assign low weight to the pre\-break cell statecτ−1c\_\{\\tau\-1\}, while the input gateiτi\_\{\\tau\}encodes the new post\-break signal into the cell state\. The resulting hidden statehτh\_\{\\tau\}carries post\-break information, and the output satisfies

\|Φ𝐖\(x1:τ\)−𝔼\[LT∣x1:τ;θB\]\|≤ϵn<δ/2\.\\bigl\|\\Phi\_\{\\mathbf\{W\}\}\(x\_\{1:\\tau\}\)\-\\mathbb\{E\}\[L\_\{T\}\\mid x\_\{1:\\tau\};\\theta\_\{B\}\]\\bigr\|\\leq\\epsilon\_\{n\}<\\delta/2\.By the triangle inequality and Assumption[3\.4](https://arxiv.org/html/2606.11463#S3.Thmtheorem4),\|𝔼\[LT∣x1:τ;θB\]−𝔼\[LT;θB\]\|→0\|\\mathbb\{E\}\[L\_\{T\}\\mid x\_\{1:\\tau\};\\theta\_\{B\}\]\-\\mathbb\{E\}\[L\_\{T\};\\theta\_\{B\}\]\|\\to 0as the conditioning sequence length increases\. Forτ\\tausufficiently large relative to the correlation length of the post\-break process, this term is also less thanδ/2\\delta/2, giving\|Φ𝐖​\(x1:τ\)−𝔼​\[LT;θB\]\|<δ\|\\Phi\_\{\\mathbf\{W\}\}\(x\_\{1:\\tau\}\)\-\\mathbb\{E\}\[L\_\{T\};\\theta\_\{B\}\]\|<\\delta\. ThereforeTdetLSTM≤τ\+1T\_\{\\det\}^\{\\mathrm\{LSTM\}\}\\leq\\tau\+1and

𝔼​\[TdetLSTM\]≤τ\+1\.\\mathbb\{E\}\\bigl\[T\_\{\\det\}^\{\\mathrm\{LSTM\}\}\\bigr\]\\leq\\tau\+1\.
Combining\.

𝔼​\[TdetCL\]−𝔼​\[TdetLSTM\]≥\(τ\+k\)−\(τ\+1\)=k−1\.\\mathbb\{E\}\\bigl\[T\_\{\\det\}^\{\\mathrm\{CL\}\}\\bigr\]\-\\mathbb\{E\}\\bigl\[T\_\{\\det\}^\{\\mathrm\{LSTM\}\}\\bigr\]\\geq\(\\tau\+k\)\-\(\\tau\+1\)=k\-1\.For the standardk=5k=5year averaging window, this yields an expected advantage of at least 4 development periods\. ∎

###### Corollary 3\.15\(Climate Variable Monotonicity\)\.

LetΦ𝐖∅\\Phi\_\{\\mathbf\{W\}\}^\{\\varnothing\}denote the LSTM without climate features andΦ𝐖𝒞\\Phi\_\{\\mathbf\{W\}\}^\{\\mathcal\{C\}\}the LSTM with climate features𝒞\\mathcal\{C\}\(e\.g\., accumulated cyclone energy, sea surface temperatures\)\. If the climate features contain information aboutθB\\theta\_\{B\}that is not redundant with the loss triangle sequence, i\.e\.,I​\(LT;𝒞∣x1:t\)\>0I\(L\_\{T\};\\,\\mathcal\{C\}\\mid x\_\{1:t\}\)\>0, then

𝔼​\[\|Φ𝐖𝒞​\(x1:t,𝒞\)−LT\|\]≤𝔼​\[\|Φ𝐖∅​\(x1:t\)−LT\|\]\.\\mathbb\{E\}\\bigl\[\|\\Phi\_\{\\mathbf\{W\}\}^\{\\mathcal\{C\}\}\(x\_\{1:t\},\\mathcal\{C\}\)\-L\_\{T\}\|\\bigr\]\\;\\leq\\;\\mathbb\{E\}\\bigl\[\|\\Phi\_\{\\mathbf\{W\}\}^\{\\varnothing\}\(x\_\{1:t\}\)\-L\_\{T\}\|\\bigr\]\.

###### Proof\.

Let𝐗t=\(x1:t\)\\mathbf\{X\}\_\{t\}=\(x\_\{1:t\}\)denote the triangle sequence and𝐂\\mathbf\{C\}the climate feature vector\. We work through oracle predictors first and then account for approximation error\.

Step 1: Oracle MSE comparison\.The minimum MSE predictor ofLTL\_\{T\}given𝐗t\\mathbf\{X\}\_\{t\}isg∅​\(𝐗t\)=𝔼​\[LT∣𝐗t\]g^\{\\varnothing\}\(\\mathbf\{X\}\_\{t\}\)=\\mathbb\{E\}\[L\_\{T\}\\mid\\mathbf\{X\}\_\{t\}\], with oracle MSEV∅=𝔼​\[Var​\(LT∣𝐗t\)\]V^\{\\varnothing\}=\\mathbb\{E\}\[\\mathrm\{Var\}\(L\_\{T\}\\mid\\mathbf\{X\}\_\{t\}\)\]\. The minimum MSE predictor given\(𝐗t,𝐂\)\(\\mathbf\{X\}\_\{t\},\\mathbf\{C\}\)isg𝒞​\(𝐗t,𝐂\)=𝔼​\[LT∣𝐗t,𝐂\]g^\{\\mathcal\{C\}\}\(\\mathbf\{X\}\_\{t\},\\mathbf\{C\}\)=\\mathbb\{E\}\[L\_\{T\}\\mid\\mathbf\{X\}\_\{t\},\\mathbf\{C\}\], with oracle MSEV𝒞=𝔼​\[Var​\(LT∣𝐗t,𝐂\)\]V^\{\\mathcal\{C\}\}=\\mathbb\{E\}\[\\mathrm\{Var\}\(L\_\{T\}\\mid\\mathbf\{X\}\_\{t\},\\mathbf\{C\}\)\]\.

The conditional variance decomposition gives

Var​\(LT∣𝐗t\)=𝔼​\[Var​\(LT∣𝐗t,𝐂\)∣𝐗t\]\+Var​\(𝔼​\[LT∣𝐗t,𝐂\]∣𝐗t\)\.\\mathrm\{Var\}\(L\_\{T\}\\mid\\mathbf\{X\}\_\{t\}\)=\\mathbb\{E\}\\bigl\[\\mathrm\{Var\}\(L\_\{T\}\\mid\\mathbf\{X\}\_\{t\},\\mathbf\{C\}\)\\mid\\mathbf\{X\}\_\{t\}\\bigr\]\+\\mathrm\{Var\}\\bigl\(\\mathbb\{E\}\[L\_\{T\}\\mid\\mathbf\{X\}\_\{t\},\\mathbf\{C\}\]\\mid\\mathbf\{X\}\_\{t\}\\bigr\)\.Taking expectations over𝐗t\\mathbf\{X\}\_\{t\}and usingI​\(LT;𝐂∣𝐗t\)\>0I\(L\_\{T\};\\mathbf\{C\}\\mid\\mathbf\{X\}\_\{t\}\)\>0, the second term is strictly positive, soV𝒞<V∅V^\{\\mathcal\{C\}\}<V^\{\\varnothing\}\. LetΔ​V=V∅−V𝒞\>0\\Delta V=V^\{\\varnothing\}\-V^\{\\mathcal\{C\}\}\>0\.

Step 2: From oracle MSE to oracle MAE\.By the Cauchy\-Schwarz inequality, for any predictorgg,

𝔼​\[\|LT−g\|\]≤𝔼​\[\(LT−g\)2\]=MSE​\(g\)\.\\mathbb\{E\}\[\|L\_\{T\}\-g\|\]\\leq\\sqrt\{\\mathbb\{E\}\[\(L\_\{T\}\-g\)^\{2\}\]\}=\\sqrt\{\\mathrm\{MSE\}\(g\)\}\.More usefully, the minimum MAE predictor is the conditional median, and the minimum MSE predictor is the conditional mean\. WhenLT∣𝐗tL\_\{T\}\\mid\\mathbf\{X\}\_\{t\}has a unimodal, symmetric distribution \(plausible under Assumption[3\.4](https://arxiv.org/html/2606.11463#S3.Thmtheorem4)\), mean and median coincide andMAE\(g∅\)=𝔼\[\|LT−𝔼\[LT∣𝐗t\]\|\]\\mathrm\{MAE\}\(g^\{\\varnothing\}\)=\\mathbb\{E\}\[\|L\_\{T\}\-\\mathbb\{E\}\[L\_\{T\}\\mid\\mathbf\{X\}\_\{t\}\]\|\]\. Under this condition,MAE𝒞≤V𝒞<V∅=MAE∅\\mathrm\{MAE\}^\{\\mathcal\{C\}\}\\leq\\sqrt\{V^\{\\mathcal\{C\}\}\}<\\sqrt\{V^\{\\varnothing\}\}=\\mathrm\{MAE\}^\{\\varnothing\}\.

Step 3: Accounting for LSTM approximation error\.Letϵ∅\\epsilon^\{\\varnothing\}andϵ𝒞\\epsilon^\{\\mathcal\{C\}\}denote the approximation errors of the respective trained LSTMs \(bounded byϵn\\epsilon\_\{n\}from Assumption[3\.11](https://arxiv.org/html/2606.11463#S3.Thmtheorem11)\)\. By the triangle inequality,

𝔼​\[\|Φ𝐖𝒞−LT\|\]\\displaystyle\\mathbb\{E\}\[\|\\Phi\_\{\\mathbf\{W\}\}^\{\\mathcal\{C\}\}\-L\_\{T\}\|\]≤MAE𝒞\+ϵ𝒞,\\displaystyle\\leq\\mathrm\{MAE\}^\{\\mathcal\{C\}\}\+\\epsilon^\{\\mathcal\{C\}\},𝔼​\[\|Φ𝐖∅−LT\|\]\\displaystyle\\mathbb\{E\}\[\|\\Phi\_\{\\mathbf\{W\}\}^\{\\varnothing\}\-L\_\{T\}\|\]≥MAE∅−ϵ∅\.\\displaystyle\\geq\\mathrm\{MAE\}^\{\\varnothing\}\-\\epsilon^\{\\varnothing\}\.For the climate\-augmented model to achieve strictly lower MAE than the triangle\-only model, it suffices that

MAE𝒞\+ϵ𝒞<MAE∅−ϵ∅,\\mathrm\{MAE\}^\{\\mathcal\{C\}\}\+\\epsilon^\{\\mathcal\{C\}\}<\\mathrm\{MAE\}^\{\\varnothing\}\-\\epsilon^\{\\varnothing\},i\.e\., the oracle MAE gapMAE∅−MAE𝒞\>ϵ∅\+ϵ𝒞\\mathrm\{MAE\}^\{\\varnothing\}\-\\mathrm\{MAE\}^\{\\mathcal\{C\}\}\>\\epsilon^\{\\varnothing\}\+\\epsilon^\{\\mathcal\{C\}\}\. SinceΔ​V\>0\\Delta V\>0and both approximation errorsϵn→0\\epsilon\_\{n\}\\to 0as the training set grows, this condition is satisfied for all sufficiently largenn\. We therefore conclude

𝔼​\[\|Φ𝐖𝒞​\(x1:t,𝒞\)−LT\|\]<𝔼​\[\|Φ𝐖∅​\(x1:t\)−LT\|\]\\mathbb\{E\}\\bigl\[\|\\Phi\_\{\\mathbf\{W\}\}^\{\\mathcal\{C\}\}\(x\_\{1:t\},\\mathcal\{C\}\)\-L\_\{T\}\|\\bigr\]\\;<\\;\\mathbb\{E\}\\bigl\[\|\\Phi\_\{\\mathbf\{W\}\}^\{\\varnothing\}\(x\_\{1:t\}\)\-L\_\{T\}\|\\bigr\]fornnsufficiently large, which gives the stated \(non\-strict\) inequality in the limit\. ∎

## 4\. Research Design

### 4\.1 Central Research Question

Can LSTM\-based reserving models detect and adapt to climate\-driven structural breaks in property insurance claims development patterns faster and more accurately than traditional actuarial methods, and what mechanisms enable this capability?

### 4\.2 Four Confirmatory Hypotheses and One Exploratory Objective

Hypotheses H1 and H2 are further grounded in Theorem[3\.13](https://arxiv.org/html/2606.11463#S3.Thmtheorem13), which provides a formal lower bound on the detection speed advantage\. H3 is grounded in Corollary[3\.15](https://arxiv.org/html/2606.11463#S3.Thmtheorem15)\. H4 is treated as an exploratory objective throughout; see Section[6](https://arxiv.org/html/2606.11463#S6)for the corresponding analysis approach\.

### 4\.3 Data Sources

- •Florida OIR Schedule P filings \(2007–2023\):Approximately 80 company\-line development triangles across the top 20 property insurers by market share, covering Homeowners and Commercial Property lines at quarterly granularity\.
- •Louisiana DOI Annual Statements \(2007–2023\):Approximately 60 company\-line triangles across the top 15 property insurers\.
- •NOAA HURDAT2 and ERSST v5:Accumulated Cyclone Energy \(ACE\) indices, maximum sustained wind speeds, and monthly sea surface temperature anomalies for Gulf and Atlantic hurricane development regions\.

Climate features are engineered as 3\-, 6\-, and 12\-month rolling averages, lagged 1–4 quarters, with ENSO phase indicators and sea surface temperature / intensity interaction terms\.

### 4\.4 Temporal Validation Split

A strict time\-based split prevents data leakage and simulates real\-world deployment conditions:

- •Training \(2007–2011\):Five years of all development periods\.
- •Validation \(2012–2016\):Five non\-catastrophe years for hyperparameter tuning\.
- •Test \(2017–2023\):Seven years spanning Hurricanes Michael, Ida, and Ian\.

The design deliberately tunes models on non\-catastrophe data, preventing optimization specifically for extremes\.

## 5\. Model Architecture

### 5\.1 LSTM with Attention \(Primary Model\)

The primary model stacks two bidirectional LSTM layers \(128 and 64 units\) followed by a Bahdanau attention layer, a dense layer with ReLU activation, and a single\-unit output for the ultimate loss estimate\. Dropout \(0\.2–0\.3\) is applied after each layer, with L2 weight regularization \(λ=0\.001\\lambda=0\.001\) and gradient clipping to prevent exploding gradients during catastrophe years\.

The custom recency\-weighted loss function is:

ℒ=∑t=1Twt⋅\(yt−y^t\)2,wt=α\(t−tmax\),α=0\.2\\mathcal\{L\}=\\sum\_\{t=1\}^\{T\}w\_\{t\}\\cdot\(y\_\{t\}\-\\hat\{y\}\_\{t\}\)^\{2\},\\qquad w\_\{t\}=\\alpha^\{\(t\-t\_\{\\max\}\)\},\\quad\\alpha=0\.2wheretmaxt\_\{\\max\}is the most recent development period\. This exponential decay assigns higher penalty to errors in recent periods and is evaluated as an ablation against standard MSE\.

### 5\.2 Ablation Models

Five ablations isolate the contribution of each component:

- •LSTM\-NoClimate:Removes all exogenous climate variables\.
- •LSTM\-NoAttention:Removes the attention mechanism\.
- •LSTM\-WeightedLoss:Substitutes the recency\-weighted loss for standard MSE\.
- •LSTM\-Unidirectional:Forward\-only LSTM without bidirectional layers\.
- •SimpleRNN:Vanilla recurrent baseline\.

### 5\.3 Traditional Method Baselines

Chain Ladder, Bornhuetter\-Ferguson, and Cape Cod are implemented to industry standard with bootstrapping \(10,000 simulations\) for confidence intervals\. These represent current actuarial practice for property loss reserving\.

## 6\. Evaluation Framework

### 6\.1 Primary Metrics

- •Reserve Accuracy Ratio\(RAR\)=L^T/LTactual\(\\mathrm\{RAR\}\)=\\widehat\{L\}\_\{T\}/L\_\{T\}^\{\\mathrm\{actual\}\}: Regulatory standard; ideal value is 1\.0\.
- •One\-Year Development Test:OYD%=R^t\+1−R^tR^t×100%\\mathrm\{OYD\\%\}=\\tfrac\{\\widehat\{R\}\_\{t\+1\}\-\\widehat\{R\}\_\{t\}\}\{\\widehat\{R\}\_\{t\}\}\\times 100\\%; industry benchmark is±15%\\pm 15\\%\.
- •MAPE:Reported separately for pre\-break \(2017–2019\), during\-break \(2020–2021\), and post\-break \(2022–2023\) periods\.
- •RMSE:Penalizes large errors more heavily, critical for catastrophe reserving\.
- •Detection Speed:Quarters elapsed from structural break until model prediction adjusts more than 10% from pre\-break baseline\.

### 6\.2 Statistical Testing

Diebold\-Mariano tests compare predictive accuracy between LSTM and each traditional method\. Wilcoxon signed\-rank tests assess median error differences on matched accident years\. Holm\-Bonferroni correction controls family\-wise error rate across H1, H2, H3, and H5\.

Power analysis for H2 \(at least 15% MAPE improvement, Cohen’sd=0\.75d=0\.75,α=0\.05\\alpha=0\.05, power=0\.80=0\.80\) requires a minimum of 30 company\-level observations; our approximately 140 company\-line combinations provide power exceeding 0\.99\.

H4 analysis approach\.Gate activation analysis is treated as an exploratory objective, not a confirmatory hypothesis test\. With only 4–5 structural break events, ap<0\.05p<0\.05threshold is not defensible: for a paired test across 5 events, the minimum achievablepp\-value under a two\-sided sign test is2×\(1/2\)5=0\.06252\\times\(1/2\)^\{5\}=0\.0625, above the standard threshold\. Instead, forget, input, and output gate activations will be extracted at each development period for each break event, pre\- and post\-break mean activations will be compared using effect size \(Cohen’sdd\), and results will be visualized as time series with bootstrapped confidence intervals\. The objective is pattern description and hypothesis generation for future work with larger break samples, not statistical confirmation\.

## 7\. Case Studies

### 7\.1 Hurricane Ida \(August 2021\)

Category 4 at Louisiana landfall with 150 mph sustained winds, Ida caused $36 billion in insured losses\. Workforce shortages and supply chain disruptions extended claims settlement timelines significantly beyond historical norms\. Models are trained through Q2 2021, evaluated at Q3 2021, and tracked through Q4 2023 against actual reserves crystallized in 2024 data\.

### 7\.2 Hurricane Ian \(September 2022\)

At $50–65 billion in insured losses, Ian is potentially the costliest hurricane in US history\. Its development pattern is uniquely challenging: in addition to physical damage, it triggered a wave of litigation and Assignment of Benefits disputes that materially extended settlement periods in ways no historical development pattern anticipated\. This makes Ian the most stringent test of the LSTM’s ability to detect pattern\-structure breaks, not merely magnitude shocks\.

### 7\.3 Social Inflation \(2019–2022\)

Unlike the sudden structural breaks of individual storms, Florida’s Assignment of Benefits abuse and litigation proliferation constitute a slow\-moving break, with development patterns degrading gradually over 3–4 years\. This case tests whether LSTM can detect gradual drift rather than step\-change events, and whether climate variables help the model distinguish weather\-driven from litigation\-driven development pattern changes\.

## 8\. Expected Contributions

### 8\.1 To Actuarial Science

- •First rigorous comparison of LSTM vs\. traditional reserving methods specifically for structural break detection using real regulatory data, with formal theoretical grounding\.
- •Methodological framework for incorporating exogenous climate variables into reserve estimates, justified by Corollary[3\.15](https://arxiv.org/html/2606.11463#S3.Thmtheorem15)\.
- •Quantification of detection speed advantages with a direct lower bound established in Theorem[3\.13](https://arxiv.org/html/2606.11463#S3.Thmtheorem13)\.
- •Attention and gate analysis tools adapted to actuarial interpretability standards\.

### 8\.2 To Machine Learning

- •Novel application of recency\-weighted loss functions for concept drift in financial regression\.
- •Demonstration of attention mechanisms for time series structural break detection\.
- •Case study in preventing overfitting on rare extreme events using regularization strategies\.
- •Real\-world validation of LSTM adaptation mechanisms beyond the synthetic datasets that dominate concept drift literature\.

### 8\.3 Practical Industry Impact

Faster structural break detection directly reduces reserve deficiency risk, freeing capital and reducing the probability of insolvency cascades that harm policyholders\. Attention\-based interpretability creates a path to regulatory acceptance\. For state insurance regulators, LSTM\-derived detection speed metrics could serve as an early warning layer within existing supervisory frameworks\.

## 9\. Limitations and Future Directions

This research is designed with rigor, but several limitations bound the scope of its conclusions:

- •Geographic scope:Florida and Louisiana are the most catastrophe\-exposed property insurance markets in the US, which maximizes signal but may limit generalizability to other regions or perils\.
- •Structural break sample size:The test period contains only four major catastrophe events\. The theoretical framework in Section 3 compensates for this constraint by providing distribution\-free guarantees\. Gate activation analysis \(H4\) is explicitly treated as exploratory throughout, with results reported as effect sizes rather than significance tests\.
- •Computational cost:LSTM training at quarterly retraining cadence requires cloud GPU infrastructure that smaller insurers and regulators may not readily access\.
- •Causation vs\. correlation:Climate variables are included as predictive signals\. This study cannot establish causal pathways between sea surface temperature anomalies and loss severity; confounding through population growth in hurricane\-prone areas or building code changes is possible\.
- •Regulatory adoption timeline:Insurance regulation evolves slowly\. Even if LSTM approaches prove superior, adoption timelines depend on regulatory capacity to evaluate model complexity\.

### 9\.1 Priority Future Directions

- •Multi\-state expansion across Gulf Coast and Atlantic seaboard states to validate generalizability\.
- •Additional perils: wildfire, earthquake, and severe convective storm all present structural break dynamics warranting analogous investigation\.
- •Ensemble methods combining LSTM with traditional actuarial methods for robustness in non\-break periods\.
- •Transformer architectures: attention\-only \(no recurrence\) models may offer superior interpretability for this application\.
- •Causal inference: instrumental variable or regression discontinuity approaches to isolate specific climate variable effects on reserve accuracy\.
- •Uncertainty quantification: prediction intervals for LSTM reserve estimates are a prerequisite for regulatory acceptance\.

## References

- Antonio and Plat \(2014\)Antonio, K\. and Plat, R\. \(2014\)\.Micro\-level stochastic loss reserving for general insurance\.Scandinavian Actuarial Journal, 2014\(7\):649–669\.
- Bahdanau et al\. \(2015\)Bahdanau, D\., Cho, K\., and Bengio, Y\. \(2015\)\.Neural machine translation by jointly learning to align and translate\.InICLR 2015\.
- Bornhuetter and Ferguson \(1972\)Bornhuetter, R\. L\. and Ferguson, R\. E\. \(1972\)\.The actuary and IBNR\.Proceedings of the Casualty Actuarial Society, 59:181–195\.
- Caron et al\. \(2018\)Caron, L\. P\. et al\. \(2018\)\.Remote subsurface ocean temperature as a predictor of Atlantic hurricane activity\.Proceedings of the National Academy of Sciences, 115\(45\):11460–11465\.
- Emanuel \(2005\)Emanuel, K\. \(2005\)\.Increasing destructiveness of tropical cyclones over the past 30 years\.Nature, 436\(7051\):686–688\.
- England and Verrall \(2002\)England, P\. D\. and Verrall, R\. J\. \(2002\)\.Stochastic claims reserving in general insurance\.British Actuarial Journal, 8\(3\):443–518\.
- Gabrielli et al\. \(2020\)Gabrielli, A\., Richman, R\., and Wüthrich, M\. V\. \(2020\)\.Neural network embedding of the over\-dispersed Poisson reserving model\.Scandinavian Actuarial Journal, 2020\(1\):1–29\.
- Gama et al\. \(2014\)Gama, J\. et al\. \(2014\)\.A survey on concept drift adaptation\.ACM Computing Surveys, 46\(4\):1–37\.
- Greff et al\. \(2017\)Greff, K\. et al\. \(2017\)\.LSTM: A search space odyssey\.IEEE Transactions on Neural Networks and Learning Systems, 28\(10\):2222–2232\.
- Hochreiter and Schmidhuber \(1997\)Hochreiter, S\. and Schmidhuber, J\. \(1997\)\.Long short\-term memory\.Neural Computation, 9\(8\):1735–1780\.
- Knutson et al\. \(2020\)Knutson, T\. R\. et al\. \(2020\)\.Tropical cyclones and climate change assessment: Part II\.Bulletin of the American Meteorological Society, 101\(3\):E303–E322\.
- Mack \(1993\)Mack, T\. \(1993\)\.Distribution\-free calculation of the standard error of chain ladder reserve estimates\.ASTIN Bulletin, 23\(2\):213–225\.
- Meyers \(2015\)Meyers, G\. G\. \(2015\)\.Stochastic loss reserving using Bayesian MCMC models\.CAS Monograph Series, Number 1\.
- Meyers and Shi \(2011\)Meyers, G\. and Shi, P\. \(2011\)\.The retrospective testing of stochastic loss reserve models\.Variance, 5\(1\):119–136\.
- Schäfer and Zimmermann \(2006\)Schäfer, A\. M\. and Zimmermann, H\. G\. \(2006\)\.Recurrent neural networks are universal approximators\.International Journal of Neural Systems, 17\(4\):253–263\.
- Shapland \(2016\)Shapland, M\. R\. \(2016\)\.Using the ODP bootstrap model: A practitioner’s guide\.CAS Monograph Series, Number 4\.
- Webster et al\. \(2005\)Webster, P\. J\. et al\. \(2005\)\.Changes in tropical cyclone number, duration, and intensity in a warming environment\.Science, 309\(5742\):1844–1846\.
- Wüthrich and Merz \(2016\)Wüthrich, M\. V\. and Merz, M\. \(2016\)\.Stochastic claims reserving methods in insurance\.John Wiley & Sons\.
- Žliobaitė et al\. \(2016\)Žliobaitė, I\., Pechenizkiy, M\., and Gama, J\. \(2016\)\.An overview of concept drift applications\.InBig Data Analysis: New Algorithms for a New Society, pages 91–114\.

Similar Articles

Physics-Informed Machine Learning for Short-Term Flood Prediction

arXiv cs.LG

Researchers propose a Physics-Informed Machine Learning (PIML) framework that integrates hydrological constraints into an LSTM loss function to improve short-term flood forecasting, particularly in data-scarce regimes. A 'Trend Alignment' constraint enforcing consistency between precipitation and discharge trends improves Nash-Sutcliffe Efficiency and eliminates unphysical predictions during extreme events.

Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins

arXiv cs.AI

This paper evaluates encoder-only Transformer and LSTM models for streamflow prediction in ungauged basins using NOAA's National Water Model simulations. Results show LSTM outperforms Transformer, and incorporating downstream information significantly improves prediction skill across both architectures.