Kalman Prototypical Networks for Few-shot Fault Detection in Combined Cycle Gas Turbines

arXiv cs.AI Papers

Summary

This paper introduces the Kalman Prototypical Network (KPN), a few-shot learning framework for fault detection in combined-cycle gas turbines. KPN models class prototypes as latent stochastic states to reduce variance and outperforms conventional methods on simulated leak detection tasks.

arXiv:2606.26710v1 Announce Type: new Abstract: Combined-cycle gas turbines (CCGTs) play a key role in modern power generation, offering both high efficiency and reduced environmental impact. However, their complex thermo-fluid and mechanical interactions complicate fault detection, particularly when labeled fault data are scarce. In this paper, we introduce the Kalman Prototypical Network (KPN), a metric-based few-shot learning (FSL) framework specifically tailored for CCGT fault diagnosis. We model the evolution of class prototypes as latent stochastic states in a dynamic system to reduce episodic variance and improve robustness in embedding representation. Synthetic data sets generated with a high-fidelity Modelica-based dynamic simulation of an offshore CCGT system were used, simulating both normal operation and progressive leak faults under transient conditions. Application of the proposed framework on simulated leak fault detection tasks demonstrate that KPN outperforms conventional FSL methods such as Matching Networks, Relation Networks, and MAML in both accuracy and stability under varying support and query configurations. The proposed framework significantly improves training convergence and generalization by stabilizing class representations, making it well-suited for real-world CCGT fault detection where labeled data is limited.
Original Article
View Cached Full Text

Cached at: 06/26/26, 05:15 AM

# Kalman Prototypical Networks for Few-shot Fault Detection in Combined Cycle Gas Turbines
Source: [https://arxiv.org/html/2606.26710](https://arxiv.org/html/2606.26710)
\\IEEEtitleabstractindextext

![[Uncaptioned image]](https://arxiv.org/html/2606.26710v1/x1.png)

\\IEEEmembershipGraduate Student Member, IEEELucas Ferreira Bernardino[![[Uncaptioned image]](https://arxiv.org/html/2606.26710v1/Images/orcid_icon.png)](https://orcid.org/0000-0002-0058-2739)Adil Rasheed[![[Uncaptioned image]](https://arxiv.org/html/2606.26710v1/Images/orcid_icon.png)](https://orcid.org/0000-0003-2690-983X)Rubén M\. Montañés[![[Uncaptioned image]](https://arxiv.org/html/2606.26710v1/Images/orcid_icon.png)](https://orcid.org/0000-0002-6600-5512)Pierluigi Salvo Rossi[![[Uncaptioned image]](https://arxiv.org/html/2606.26710v1/Images/orcid_icon.png)](https://orcid.org/0000-0001-6834-8482)\\IEEEmembershipSenior Member, IEEEThis work was partially supported by the Research Council of Norway under the project DIGITAL TWIN within the PETROMAKS2 framework \(project nr\. 318899\)\.M\.A\. Belay is with the Dept\. Electronic Systems, Norwegian University of Science and Technology, 7034 Trondheim, Norway \(e\-mail: mohammed\.a\.belay@ntnu\.no\)\.L\. Ferreira Bernardino and R\.M\. Montañes are with the Dept\. Gas Technology, SINTEF Energy Research, 7491 Trondheim, Norway \(e\-mail: lucas\.bernardino@sintef\.no, ruben\.mocholi\.montanes@sintef\.no\)\.A\. Rasheed is with the Dept\. Engineering Cybernetics, Norwegian University of Science and Technology, 7034 Trondheim, Norway \(e\-mail: adil\.rasheed@ntnu\.no\)\.P\. Salvo Rossi is with the Dept\. Electronic Systems, Norwegian University of Science and Technology, 7034 Trondheim, Norway, and with the Dept\. Gas Technology, SINTEF Energy Research, 7491 Trondheim, Norway \(e\-mail: salvorossi@ieee\.org\)\.Manuscript received Month 00, 2025; revised Month 00, 2025\.

###### Abstract

Combined\-cycle gas turbines \(CCGTs\) play a key role in modern power generation, offering both high efficiency and reduced environmental impact\. However, their complex thermo\-fluid and mechanical interactions complicate fault detection, particularly when labeled fault data are scarce\. In this paper, we introduce the Kalman Prototypical Network \(KPN\), a metric\-based few\-shot learning \(FSL\) framework specifically tailored for CCGT fault diagnosis\. We model the evolution of class prototypes as latent stochastic states in a dynamic system to reduce episodic variance and improve robustness in embedding representation\. Synthetic data sets generated with a high\-fidelity Modelica\-based dynamic simulation of an offshore CCGT system were used, simulating both normal operation and progressive leak faults under transient conditions\. Application of the proposed framework on simulated leak fault detection tasks demonstrate that KPN outperforms conventional FSL methods such as Matching Networks, Relation Networks, and MAML in both accuracy and stability under varying support and query configurations\. The proposed framework significantly improves training convergence and generalization by stabilizing class representations, making it well\-suited for real\-world CCGT fault detection where labeled data is limited\.

\{IEEEkeywords\}Anomaly detection, combined\-cycle gas turbine, dynamic model, leak detection, Few\-shot learning, prototypical network, kalman filter\.

## 1Introduction

Combined\-cycle gas turbines \(CCGT\) have emerged as crucial components in modern power generation, particularly favored due to their high efficiency, reliability, and environmental benefits\. CCGT power plants couple a gas turbine with a heat\-recovery steam generator \(HRSG\) and a downstream steam turbine, achieving thermal efficiencies often exceeding 60%\[[21](https://arxiv.org/html/2606.26710#bib.bib39)\]\. This high efficiency has driven widespread onshore deployment and growing interest in offshore installations—where recovered excess heat can meet platform power demands while cutting CO2emissions by up to 25% and making CCGTs favorable for modern low\-emission power generation\[[25](https://arxiv.org/html/2606.26710#bib.bib3)\]\. Despite their efficiency gains, CCGT systems are inherently complex, featuring tightly coupled thermo\-fluid and mechanical subsystems\. Even small faults, such as a single tube leak in the HRSG, can cause high\-pressure steam or water to escape, reducing net power output, damaging downstream machinery, forcing unplanned shutdowns, and introducing safety and environmental hazards\. Consequently, maintaining reliability and preventing costly downtime, therefore, relies on early and accurate detection of such anomalous events\.

Recently, several CCGT fault\-detection methods have been proposed, including physics\-based models and supervised machine\-learning models\. Physics\-based methods rely on high\-fidelity dynamic simulations to capture the full transient behavior of a CCGT under varying loads\. Supervised machine learning methods utilize real\-world operational labeled datasets covering each fault mode to train models\[[15](https://arxiv.org/html/2606.26710#bib.bib5),[5](https://arxiv.org/html/2606.26710#bib.bib46),[12](https://arxiv.org/html/2606.26710#bib.bib47)\]\. However, CCGT fault detection faces significant challenges, primarily due to the scarcity of accurately labeled datasets\. Real\-world fault occurrences in CCGTs are rare, data is frequently inaccessible due to confidentiality, and manual labeling of sensor streams is both time\-consuming and resource\-intensive\. As a result, labeled fault data remain scarce and costly to obtain, creating a critical bottleneck for deploying reliable, data\-driven diagnostic systems\[[6](https://arxiv.org/html/2606.26710#bib.bib34),[41](https://arxiv.org/html/2606.26710#bib.bib48),[4](https://arxiv.org/html/2606.26710#bib.bib44)\]\. To address such challenges, fault detection methods that can learn from limited or unlabeled fault examples, such as unsupervised anomaly detection and few\-shot learning \(FSL\), are employed in various safety\-critical industrial operations\.

Few\-shot learning frameworks enable machine learning models to generalize from minimal labeled examples, making them ideally suited to CCGT industrial domains, where labeling costs are prohibitive and fault data are scarce\. Metric\-based FSL methods, such as prototypical networks, have demonstrated efficient fault detection and classification by learning robust embeddings that generalize well from very few labeled samples\. These networks classify data based on proximity to class\-specific prototypes computed from support sets, effectively creating representative anchors in the embedding space\. However, prototypical networks suffer from significant instability due to training episodic variance—where the prototypes derived from small support sets can vary substantially between episodes, adversely affecting model robustness and classification accuracy\. This episodic instability arises because each training episode may contain different subsets of data, leading to variability in the computed prototypes\. Consequently, despite their promise, this instability presents a critical barrier to effective deployment in real\-world industrial anomaly detection scenarios\. To address this challenge, we proposed a robust Kalman prototypical network for few\-shot fault detection in CCGT systems\. Moreover, we utilize data generated with a high\-fidelity Modelica/Dymola dynamic simulation model of CCGT to generate synthetic faults that provided the necessary data for algorithm development, benchmarking, and ultimately robust deployment of advanced diagnostic capabilities in safety\-critical CCGT operations\[[3](https://arxiv.org/html/2606.26710#bib.bib45)\]\. Specifically, the primary contributions of this paper are summarized as follows:

- •We propose a Kalman\-based Prototypical Network \(KPN\) that models prototype evolution as a latent stochastic state and allows for stable few\-shot representation learning\.
- •We utilize synthetic data sets from a high\-fidelity dynamic model of an existing offshore CCGT system\. The data sets are extensive and represent realistic performance under transient operating conditions, simulating both normal operational dynamics and leak\-induced anomalies\.
- •We performed an extensive performance analysis using baseline few\-shot learning algorithms\.

The remainder of this paper is structured as follows\. Section II reviews related work and provides background on supervised and unsupervised anomaly detection methods and few\-shot learning\. Section III presents the proposed Kalman Prototypical Network method\. Section IV describes the experimental setup, including dataset details and implementation specifics\. Section V presents and discusses the experimental results, demonstrating the effectiveness of KPN in stabilizing prototypes and enhancing anomaly detection performance\. Finally, Section VI concludes the paper and outlines directions for future research\.

## 2Related Works

### 2\.1Fault detection in CCGT systems

Fault detection and diagnosis \(FDD\) in CCGT systems are crucial for operational efficiency, safety, and reliability\[[24](https://arxiv.org/html/2606.26710#bib.bib28),[40](https://arxiv.org/html/2606.26710#bib.bib43)\]\. Several methods have been developed, including model\-based, data\-driven, and hybrid approaches\[[10](https://arxiv.org/html/2606.26710#bib.bib12),[9](https://arxiv.org/html/2606.26710#bib.bib11)\]\. In particular, tube leaks are common in HRSG and OTSG units, often due to thermo‑mechanical fatigue, corrosion, or water/steam quality issues\[[42](https://arxiv.org/html/2606.26710#bib.bib13),[44](https://arxiv.org/html/2606.26710#bib.bib14),[29](https://arxiv.org/html/2606.26710#bib.bib37)\]\. Undetected leaks cause production losses and costly repairs\[[14](https://arxiv.org/html/2606.26710#bib.bib8)\]\.

Model\-based fault detection methods use mathematical models to represent system behavior\[[30](https://arxiv.org/html/2606.26710#bib.bib40)\]\. Data\-driven methods, such as neural networks, support vector machines, and deep learning, utilize historical and real\-time data to detect faults without requiring detailed physical models\[[13](https://arxiv.org/html/2606.26710#bib.bib18),[28](https://arxiv.org/html/2606.26710#bib.bib16),[18](https://arxiv.org/html/2606.26710#bib.bib4),[2](https://arxiv.org/html/2606.26710#bib.bib15)\]\. Hybrid approaches combine physical models with data\-driven techniques\[[16](https://arxiv.org/html/2606.26710#bib.bib26)\]\. Pourbabaee et al\.\[[30](https://arxiv.org/html/2606.26710#bib.bib40)\]proposed a gas turbine sensor fault detection, isolation, and identification \(FDII\) method based on multiple hybrid Kalman filters \(MHKFs\)\. Camporeale et al\.\[[13](https://arxiv.org/html/2606.26710#bib.bib18)\]introduced a fault diagnosis system for CCGTs based on feed\-forward neural networks\. Nayeri et al\.\[[28](https://arxiv.org/html/2606.26710#bib.bib16)\]proposed a Fault Detection and Isolation \(FDI\) system based on an ensemble\-based hierarchical classifier\. Ajami et al\.\[[1](https://arxiv.org/html/2606.26710#bib.bib10)\]explore independent component analysis \(ICA\) for fault detection and identification in the turbine system of a thermal power plant\. Fahmi et al\.\[[18](https://arxiv.org/html/2606.26710#bib.bib4)\]proposed a temporal convolutional autoencoder for gas turbine fault diagnosis using vibration data\. Sarwar et al\.\[[37](https://arxiv.org/html/2606.26710#bib.bib27)\]presented a multi\-sensor data fusion framework for fault detection and diagnosis in an industrial gas turbine engine\. Barrera et al\.\[[2](https://arxiv.org/html/2606.26710#bib.bib15)\]introduce clustering and autoencoders to train predictive maintenance algorithms\. Sampath et al\.\[[35](https://arxiv.org/html/2606.26710#bib.bib20)\]propose a hybrid approach that combines real\-world sensor data and information from simulation models\. Fast et al\.\[[19](https://arxiv.org/html/2606.26710#bib.bib6)\]applied artificial neural networks to monitor the condition and diagnose faults in a combined heat and power plant\. Davallo et al\.\[[17](https://arxiv.org/html/2606.26710#bib.bib19)\]proposed an Extreme Learning Machine framework for the detection and identification of leaks in an onshore CCGT\. Chao et al\.\[[16](https://arxiv.org/html/2606.26710#bib.bib26)\]proposed a hybrid approach combining physical performance models with deep learning algorithms\.

### 2\.2Few\-shot Fault detection

In industrial settings, acquiring labeled fault data is often challenging due to the rarity of fault occurrences and the high costs associated with data annotation\[[46](https://arxiv.org/html/2606.26710#bib.bib29),[22](https://arxiv.org/html/2606.26710#bib.bib41)\]\. Conventional supervised learning models, which rely heavily on extensive labeled datasets, often underperform in data\-constrained conditions\[[11](https://arxiv.org/html/2606.26710#bib.bib42),[8](https://arxiv.org/html/2606.26710#bib.bib7),[7](https://arxiv.org/html/2606.26710#bib.bib35)\]\. Few\-shot learning \(FSL\) has emerged as a promising solution to this problem, enabling models to generalize from a limited number of labeled examples\[[33](https://arxiv.org/html/2606.26710#bib.bib2),[23](https://arxiv.org/html/2606.26710#bib.bib24)\]\. Several specialized FSL frameworks have been developed that include metric\-based, optimization\-based, and memory\-based approaches\. Meta\-based methods, such as prototypical networks, matching networks, and relation networks, focus on learning discriminative latent embedding spaces to perform classification based on similarity measures\.\[[38](https://arxiv.org/html/2606.26710#bib.bib38),[43](https://arxiv.org/html/2606.26710#bib.bib31),[39](https://arxiv.org/html/2606.26710#bib.bib30)\]\. Optimization\-based methods, such as Model\-Agnostic Meta\-Learning \(MAML\), learn optimization strategies to quickly adapt to new tasks using limited samples\[[32](https://arxiv.org/html/2606.26710#bib.bib36),[20](https://arxiv.org/html/2606.26710#bib.bib33)\]\. Memory\-based methods incorporate external memory structures or attention mechanisms to store and retrieve information efficiently\[[36](https://arxiv.org/html/2606.26710#bib.bib32)\]\.

Recent studies have explored various FSL approaches for fault detection and diagnosis\. Zhang et al\.\[[45](https://arxiv.org/html/2606.26710#bib.bib21)\]introduced a few\-shot learning framework for bearing fault diagnosis based on MAML, demonstrating its superiority over traditional methods in scenarios with limited labeled data\. Qiao et al\.\[[31](https://arxiv.org/html/2606.26710#bib.bib17)\]presented a few\-shot fault diagnosis model for wind turbine \(WT\) generators employing a Convolutional Normalization Transformer Encoder \(CNTE\) based on MAML\. Zhang et al\.\[[47](https://arxiv.org/html/2606.26710#bib.bib25)\]introduces a prototypical network few\-shot learning approach for anomaly detection in nuclear power plants\. Ren et al\.\[[34](https://arxiv.org/html/2606.26710#bib.bib22)\]proposed a few\-shot GAN, which uses a sample\-rich class to provide a sample distribution paradigm for the sample\-poor class\. Zheng et al\.\[[48](https://arxiv.org/html/2606.26710#bib.bib23)\]proposed fault diagnosis based on an improved meta\-relation network\. Despite few\-shot fault detection and diagnosis in various industrial domains, challenges remain in ensuring the stability and robustness of these models, especially under varying operational conditions\. Moreover, FSL on CCGT systems remains unexplored\. Addressing these issues is crucial for the reliable deployment of FSL models in real\-world CCGT fault detection scenarios\.

## 3The Proposed Method

In this section, we present the proposed robust prototypical network called Kalman Prototypical Network \(KPN\) for few\-shot fault detection in CCGT systems\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x2.png)Figure 1:Kalman Prototypical Network framework \(per classkkper episodett\)\.### 3\.1Prototypical Networks

Prototypical networks are a class of metric\-based few\-shot learning models that classify examples based on their proximity to class\-specificprototypesin a learned embedding space\[[38](https://arxiv.org/html/2606.26710#bib.bib38)\]\. Letℰθ:𝒳→ℝd\\mathcal\{E\}\_\{\\theta\}:\\mathcal\{X\}\\rightarrow\\mathbb\{R\}^\{d\}be an embedding function parameterized byθ\\theta, which maps an input𝐱∈𝒳\\mathbf\{x\}\\in\\mathcal\{X\}to add\-dimensional latent representationℰθ​\(𝐱\)∈ℝd\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\)\\in\\mathbb\{R\}^\{d\}\. In each few\-shot trainingepisode, the model is provided with data partitioned into asupport set𝒮=⋃k=1K𝒮k\\mathcal\{S\}=\\bigcup\_\{k=1\}^\{K\}\\mathcal\{S\}\_\{k\}, where𝒮k=\{\(𝐱i,yi\)∈𝒳×𝒴∣yi=k\}\\mathcal\{S\}\_\{k\}=\\\{\(\\mathbf\{x\}\_\{i\},y\_\{i\}\)\\in\\mathcal\{X\}\\times\\mathcal\{Y\}\\mid y\_\{i\}=k\\\}containsNkN\_\{k\}labeled examples of class,k∈\{1,…,K\}k\\in\\\{1,\\dots,K\\\}and a disjointquery set𝒬⊂𝒳\\mathcal\{Q\}\\subset\\mathcal\{X\}for classification\. Prototypical networks perform few\-shot classification by computing a representative*prototype*for each class using a learned embedding function\. For each classkk, the class prototype𝐩k∈ℝd\\mathbf\{p\}\_\{k\}\\in\\mathbb\{R\}^\{d\}is defined as the mean of embedded support examples from classkk:

𝐩k=1Nk​∑\(𝐱i,yi\)∈𝒮kℰθ​\(𝐱i\)\\mathbf\{p\}\_\{k\}=\\frac\{1\}\{N\_\{k\}\}\\sum\_\{\(\\mathbf\{x\}\_\{i\},y\_\{i\}\)\\in\\mathcal\{S\}\_\{k\}\}\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\_\{i\}\)\(1\)This forms a set of class prototypes𝒫=\{𝐩1,…,𝐩K\}⊂ℝd\\mathcal\{P\}=\\\{\\mathbf\{p\}\_\{1\},\\ldots,\\mathbf\{p\}\_\{K\}\\\}\\subset\\mathbb\{R\}^\{d\}, which act as anchors for class\-conditional distributions in the embedding space\. Given a query example𝐱∈𝒬\\mathbf\{x\}\\in\\mathcal\{Q\}, we embed it usingℰθ\\mathcal\{E\}\_\{\\theta\}and compute the squared Euclidean distance to each class prototype\. The distances are converted into class probabilities using a softmax over negative distances\. This formulation is equivalent to modeling each class as a spherical Gaussian in latent space, centered at its prototype, with shared isotropic covariance\. The model is trained by minimizing the negative log\-likelihood over query labels\. Figure[2](https://arxiv.org/html/2606.26710#S3.F2)depicts support, query, and prototype in the principal component analysis \(PCA\) reduced embedding\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x3.png)Figure 2:PCA reduced embeddings: Support, Query, Prototypes for a single episode on gas turbine dataset
### 3\.2Prototype Trajectory Instability

While prototypical networks assume that class prototypes𝐩k\\mathbf\{p\}\_\{k\}are reliable and representative centroids of class embeddings within an episode, in practice, these prototypes arenon\-stationaryover the course of training\. That is, the prototype for a given classkk, denoted as𝐩k\(t\)∈ℝd\\mathbf\{p\}\_\{k\}^\{\(t\)\}\\in\\mathbb\{R\}^\{d\}an episodett, may vary significantly across episodes due to stochastic sampling of support sets, instability in the embedding functionℰθ\\mathcal\{E\}\_\{\\theta\}, and shifts in the latent structure of the data\.

Let\{𝐩k\(t\)\}t=1T\\\{\\mathbf\{p\}\_\{k\}^\{\(t\)\}\\\}\_\{t=1\}^\{T\}denote the sequence of prototypes for classkkacrossTTtraining episodes\. Our empirical observation \(e\.g\., via PCA or t\-SNE projections\) shows that:

- •Prototypes often follow a smooth but nonlinear trajectory through the latent space\.
- •The distance‖𝐩k\(t\+1\)−𝐩k\(t\)‖\\\|\\mathbf\{p\}\_\{k\}^\{\(t\+1\)\}\-\\mathbf\{p\}\_\{k\}^\{\(t\)\}\\\|is non\-negligible, indicating episoidal drift\.
- •Prototypes may oscillate or diverge, particularly in high\-variance regimes or early training stages\.

Such behavior suggests that the episodic average used in vanilla prototypical networks may be an inconsistent estimate of the true latent class centroid, particularly when support sets are small or non\-representative\. We reinterpret prototype evolution as a time series\{𝐩k\(t\)\}t=1T\\\{\\mathbf\{p\}\_\{k\}^\{\(t\)\}\\\}\_\{t=1\}^\{T\}, generated by a latent dynamic process\. The goal is to recover adenoised𝐩^k\(t\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}of the prototype at each episode\. We utilize Kalman filters to reduceprototype variance\(σk2\\sigma\_\{k\}^\{2\}\), which use both current observations and prior history to optimally estimate latent variables in dynamic systems\. Higherσk2\\sigma\_\{k\}^\{2\}is often associated with degraded generalization due to inconsistent decision boundaries for query samples\. By enforcing smoothing across episodes, we aim to reduceσk2\\sigma\_\{k\}^\{2\}, improvestability, and ultimately enhancerobustness during test\-time classification\.

### 3\.3Kalman\-Based Prototype Estimation

To address the instability in the prototype sequence\{𝐩k\(t\)\}t=1T\\\{\\mathbf\{p\}\_\{k\}^\{\(t\)\}\\\}\_\{t=1\}^\{T\}, we propose modeling prototype evolution as a latent linear dynamical system\. Specifically, we treat the true class prototype𝐩^k\(t\)∈ℝd\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\\in\\mathbb\{R\}^\{d\}at episodettas a latent state, and the observed prototype𝐩k\(t\)\\mathbf\{p\}\_\{k\}^\{\(t\)\}as a noisy measurement of that state\.

#### Prototype State Space Model

We assume the following discrete\-time linear Gaussian system for each classkk:

- •State transition equation \(process model\): 𝐩^k\(t\)=𝐅​𝐩^k\(t−1\)\+𝐰k\(t\),𝐰k\(t\)∼𝒩​\(𝟎,𝐐\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}=\\mathbf\{F\}\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\-1\)\}\+\\mathbf\{w\}\_\{k\}^\{\(t\)\},\\quad\\mathbf\{w\}\_\{k\}^\{\(t\)\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{Q\}\)\(2\)
- •Observation equation \(measurement model\): 𝐩k\(t\)=𝐇​𝐩^k\(t\)\+𝐯k\(t\),𝐯k\(t\)∼𝒩​\(𝟎,𝐑\)\\mathbf\{p\}\_\{k\}^\{\(t\)\}=\\mathbf\{H\}\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\+\\mathbf\{v\}\_\{k\}^\{\(t\)\},\\quad\\mathbf\{v\}\_\{k\}^\{\(t\)\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{R\}\)\(3\)

Here,𝐩^k\(t\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}is the latent prototype,𝐩k\(t\)\\mathbf\{p\}\_\{k\}^\{\(t\)\}is the observed prototype from the support set, and𝐅,𝐇∈ℝd×d\\mathbf\{F\},\\mathbf\{H\}\\in\\mathbb\{R\}^\{d\\times d\}are the transition and observation matrices, typically set to identity\. The covariance matrices𝐐,𝐑∈ℝd×d\\mathbf\{Q\},\\mathbf\{R\}\\in\\mathbb\{R\}^\{d\\times d\}control the process and observation noise, respectively\.

#### Prototype Recursive Kalman Update

Given the prior estimate𝐩^k\(t−1\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\-1\)\}with covariance𝐏k\(t−1\)\\mathbf\{P\}\_\{k\}^\{\(t\-1\)\}, the Kalman filter proceeds in two steps:

##### Prediction Step

𝐩^k\|tprior\\displaystyle\\hat\{\\mathbf\{p\}\}\_\{k\|t\}^\{\\text\{prior\}\}=𝐅​𝐩^k\(t−1\)\\displaystyle=\\mathbf\{F\}\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\-1\)\}\(4\)𝐏k\|tprior\\displaystyle\\mathbf\{P\}\_\{k\|t\}^\{\\text\{prior\}\}=𝐅𝐏k\(t−1\)​𝐅⊤\+𝐐\\displaystyle=\\mathbf\{F\}\\mathbf\{P\}\_\{k\}^\{\(t\-1\)\}\\mathbf\{F\}^\{\\top\}\+\\mathbf\{Q\}\(5\)

##### Update Step

𝐊k\(t\)\\displaystyle\\mathbf\{K\}\_\{k\}^\{\(t\)\}=𝐏k\|tprior​𝐇⊤​\(𝐇𝐏k\|tprior​𝐇⊤\+𝐑\)−1\\displaystyle=\\mathbf\{P\}\_\{k\|t\}^\{\\text\{prior\}\}\\mathbf\{H\}^\{\\top\}\\left\(\\mathbf\{H\}\\mathbf\{P\}\_\{k\|t\}^\{\\text\{prior\}\}\\mathbf\{H\}^\{\\top\}\+\\mathbf\{R\}\\right\)^\{\-1\}\(6\)𝐩^k\(t\)\\displaystyle\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}=𝐩^k\|tprior\+𝐊k\(t\)​\(𝐩k\(t\)−𝐇​𝐩^k\|tprior\)\\displaystyle=\\hat\{\\mathbf\{p\}\}\_\{k\|t\}^\{\\text\{prior\}\}\+\\mathbf\{K\}\_\{k\}^\{\(t\)\}\\left\(\\mathbf\{p\}\_\{k\}^\{\(t\)\}\-\\mathbf\{H\}\\hat\{\\mathbf\{p\}\}\_\{k\|t\}^\{\\text\{prior\}\}\\right\)\(7\)𝐏k\(t\)\\displaystyle\\mathbf\{P\}\_\{k\}^\{\(t\)\}=\(𝐈−𝐊k\(t\)​𝐇\)​𝐏k\|tprior\\displaystyle=\(\\mathbf\{I\}\-\\mathbf\{K\}\_\{k\}^\{\(t\)\}\\mathbf\{H\}\)\\mathbf\{P\}\_\{k\|t\}^\{\\text\{prior\}\}\(8\)For prototype tracking, we assumed:𝐅=𝐇=𝐈,𝐐=q⋅𝐈,𝐑=r⋅𝐈\\mathbf\{F\}=\\mathbf\{H\}=\\mathbf\{I\},\\quad\\mathbf\{Q\}=q\\cdot\\mathbf\{I\},\\quad\\mathbf\{R\}=r\\cdot\\mathbf\{I\}, whereq,r\>0q,r\>0are tunable scalars\. This yields a simplified updates:

𝐊k\(t\)=𝐏k\|tprior𝐏k\|tprior\+r⋅𝐈,𝐩^k\(t\)=𝐩^k\|tprior\+𝐊k\(t\)​\(𝐩k\(t\)−𝐩^k\|tprior\)\\mathbf\{K\}\_\{k\}^\{\(t\)\}=\\frac\{\\mathbf\{P\}\_\{k\|t\}^\{\\text\{prior\}\}\}\{\\mathbf\{P\}\_\{k\|t\}^\{\\text\{prior\}\}\+r\\cdot\\mathbf\{I\}\},\\quad\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}=\\hat\{\\mathbf\{p\}\}\_\{k\|t\}^\{\\text\{prior\}\}\+\\mathbf\{K\}\_\{k\}^\{\(t\)\}\\left\(\\mathbf\{p\}\_\{k\}^\{\(t\)\}\-\\hat\{\\mathbf\{p\}\}\_\{k\|t\}^\{\\text\{prior\}\}\\right\)\(9\)
During episodic training, we maintain a filtered prototype𝐩^k\(t\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}for each class, updated using the above equations\. These filtered prototypes are then used in place of raw episodic means when classifying query samples\. This approach leads to a smoothed and denoised prototype trajectories, reduced variance in decision boundaries, and improved generalization in few\-shot learning tasks\.

### 3\.4Kalman Prototypical Network

We incorporated the Kalman filter into the prototypical network learning pipeline, resulting in theKalman Prototypical Network \(KPN\)\. In our framework, the Kalman filter operates over training episodes, i\.e\., the time indexttdenotes the episode count rather than physical time\. The observation𝐩^k\(t\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}is the per\-episode prototype computed from a stochastically sampled support set, and the filter provides temporal smoothing across episodes to reduce episodic variance in class centroids\. This differs from real\-time or online training, wherettwould index sequential measurements and the filter would update as new data arrived in chronological order\. The central idea is to treat each class prototype𝐩^k\(t\)∈ℝd\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\\in\\mathbb\{R\}^\{d\}as a latent state variable, which evolves over training episodest=1,…,Tt=1,\\ldots,T\. The observed prototypes𝐩k\(t\)\\mathbf\{p\}\_\{k\}^\{\(t\)\}, computed from episodic support sets, are viewed as noisy measurements of these latent states\.

For a given query example𝐱∈𝒬\(t\)\\mathbf\{x\}\\in\\mathcal\{Q\}^\{\(t\)\}, with embedding𝐳=ℰθ​\(𝐱\)\\mathbf\{z\}=\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\), classification is based on the squared Euclidean distance to theKalman\-filtered prototypes:

d​\(𝐳,𝐩^k\(t\)\)=‖ℰθ​\(𝐱\)−𝐩^k\(t\)‖22=\(ℰθ​\(𝐱\)−𝐩^k\(t\)\)⊤​\(ℰθ​\(𝐱\)−𝐩^k\(t\)\)d\(\\mathbf\{z\},\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\)=\\\|\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\)\-\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\\\|\_\{2\}^\{2\}=\(\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\)\-\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\)^\{\\top\}\(\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\)\-\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\)\(10\)
The distances are converted into class probabilities using a softmax over negative distances:

P​\(y=k∣𝐱\)=exp⁡\(−‖ℰθ​\(𝐱\)−𝐩^k\(t\)‖2\)∑k′=1Kexp⁡\(−‖ℰθ​\(𝐱\)−𝐩^k′\(t\)‖2\)\.P\(y=k\\mid\\mathbf\{x\}\)=\\frac\{\\exp\\left\(\-\\\|\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\)\-\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\\\|^\{2\}\\right\)\}\{\\sum\_\{k^\{\\prime\}=1\}^\{K\}\\exp\\left\(\-\\\|\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\)\-\\hat\{\\mathbf\{p\}\}\_\{k^\{\\prime\}\}^\{\(t\)\}\\\|^\{2\}\\right\)\}\.\(11\)
The model is trained by minimizing the negative log\-likelihood over query labels\. The total loss for the episodettis given by:

ℒK​P​N\(t\)=−∑\(𝐱j,yj\)∈𝒬\(t\)log⁡P​\(yj∣𝐱j\)\.\\mathcal\{L\}\_\{KPN\}^\{\(t\)\}=\-\\sum\_\{\(\\mathbf\{x\}\_\{j\},y\_\{j\}\)\\in\\mathcal\{Q\}^\{\(t\)\}\}\\log P\(y\_\{j\}\\mid\\mathbf\{x\}\_\{j\}\)\.\(12\)
Gradient\-based optimization is applied to updateθ\\theta, the parameters of the embedding function, while the Kalman parameters𝐐,𝐑\\mathbf\{Q\},\\mathbf\{R\}are either fixed or tuned separately\. The gradient of loss pushes the embeddingℰθ\\mathcal\{E\}\_\{\\theta\}closer to its true prototype𝐩y\\mathbf\{p\}\_\{y\}while simultaneously repelling it from the other prototypes, weighted by their exponentially scaled distances\. During inference, we evaluate the model on query examples𝒬test\\mathcal\{Q\}^\{\\text\{test\}\}using a fixed prototype𝐩^k\(∗\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(\*\)\}for each classkk\. We use the last filtered prototype \(i\.e\.𝐩^k\(∗\)=𝐩^k\(T\)\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(\*\)\}=\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(T\)\}\) for inference\. The model predicts the labely^∈\{1,…,K\}\\hat\{y\}\\in\\\{1,\\dots,K\\\}by selecting the minimum distance from all class prototypes\.

y^=arg⁡mink⁡d​\(ℰθ​\(𝐱\),𝐩^k\(T\)\)\\hat\{y\}=\\arg\\min\_\{k\}\\;d\\left\(\\mathcal\{E\}\_\{\\theta\}\(\\mathbf\{x\}\),\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(T\)\}\\right\)\(13\)This can be interpreted as a linear classifier in the embedding space when the prototypes are fixed and distances are expanded as inner products\. The training algorithm for KPN is summarized in Algorithm[1](https://arxiv.org/html/2606.26710#alg1)\.

Algorithm 1Kalman Prototypical Network \(Training\)1:Episodes

\{Et\}t=1T\\\{E\_\{t\}\\\}\_\{t=1\}^\{T\}, where

Et=\(S\(t\),Q\(t\)\)E\_\{t\}=\(S^\{\(t\)\},Q^\{\(t\)\}\); classes

𝒦=\{1,…,K\}\\mathcal\{K\}=\\\{1,\\dots,K\\\}; embedding

Eθ​\(⋅\)E\_\{\\theta\}\(\\cdot\); Kalman parameters

F=IF=I,

H=IH=I,

Q=q​IQ=qI,

R=r​IR=rI; initialization

p^k\(0\)←0\\hat\{p\}^\{\(0\)\}\_\{k\}\\leftarrow 0,

Pk\(0\)←α​IP^\{\(0\)\}\_\{k\}\\leftarrow\\alpha Ifor all

k∈𝒦k\\in\\mathcal\{K\}\.

2:Trained parameters

θ\\thetaand

\{p^k\(T\),Pk\(T\)\}k=1K\\\{\\hat\{p\}^\{\(T\)\}\_\{k\},P^\{\(T\)\}\_\{k\}\\\}\_\{k=1\}^\{K\}\.

3:for

t=1t=1to

TTdo

4:for all

k∈𝒦k\\in\\mathcal\{K\}do⊳\\trianglerightEpisodic prototypes

5:

Zk\(t\)←\{Eθ​\(x\)∣\(x,y=k\)∈S\(t\)\}Z^\{\(t\)\}\_\{k\}\\leftarrow\\\{\\,E\_\{\\theta\}\(x\)\\;\\mid\\;\(x,y\{=\}k\)\\in S^\{\(t\)\}\\,\\\}
6:

pk\(t\)←1\|Zk\(t\)\|​∑z∈Zk\(t\)zp^\{\(t\)\}\_\{k\}\\leftarrow\\dfrac\{1\}\{\|Z^\{\(t\)\}\_\{k\}\|\}\\sum\_\{z\\in Z^\{\(t\)\}\_\{k\}\}z
7:endfor

8:for all

k∈𝒦k\\in\\mathcal\{K\}do⊳\\trianglerightKalman prediction

9:

p^k\|tprior←p^k\(t−1\)\\hat\{p\}^\{\\mathrm\{prior\}\}\_\{k\|t\}\\leftarrow\\hat\{p\}^\{\(t\-1\)\}\_\{k\}
10:

Pk\|tprior←Pk\(t−1\)\+QP^\{\\mathrm\{prior\}\}\_\{k\|t\}\\leftarrow P^\{\(t\-1\)\}\_\{k\}\+Q
11:endfor

12:for all

k∈𝒦k\\in\\mathcal\{K\}do⊳\\trianglerightKalman update

13:

Kk\(t\)←Pk\|tprior​\(Pk\|tprior\+R\)−1K^\{\(t\)\}\_\{k\}\\leftarrow P^\{\\mathrm\{prior\}\}\_\{k\|t\}\\\!\\left\(P^\{\\mathrm\{prior\}\}\_\{k\|t\}\+R\\right\)^\{\-1\}
14:

p^k\(t\)←p^k\|tprior\+Kk\(t\)​\(pk\(t\)−p^k\|tprior\)\\hat\{p\}^\{\(t\)\}\_\{k\}\\leftarrow\\hat\{p\}^\{\\mathrm\{prior\}\}\_\{k\|t\}\+K^\{\(t\)\}\_\{k\}\\\!\\left\(p^\{\(t\)\}\_\{k\}\-\\hat\{p\}^\{\\mathrm\{prior\}\}\_\{k\|t\}\\right\)
15:

Pk\(t\)←\(I−Kk\(t\)\)​Pk\|tpriorP^\{\(t\)\}\_\{k\}\\leftarrow\\left\(I\-K^\{\(t\)\}\_\{k\}\\right\)P^\{\\mathrm\{prior\}\}\_\{k\|t\}
16:endfor

17:

LKPN\(t\)←0L^\{\(t\)\}\_\{\\mathrm\{KPN\}\}\\leftarrow 0⊳\\trianglerightQuery classification and loss

18:for all

\(xj,yj\)∈Q\(t\)\(x\_\{j\},y\_\{j\}\)\\in Q^\{\(t\)\}do

19:

zj←Eθ​\(xj\)z\_\{j\}\\leftarrow E\_\{\\theta\}\(x\_\{j\}\)
20:for all

k∈𝒦k\\in\\mathcal\{K\}do

21:

dk←‖zj−p^k\(t\)‖22d\_\{k\}\\leftarrow\\\|z\_\{j\}\-\\hat\{p\}^\{\(t\)\}\_\{k\}\\\|\_\{2\}^\{2\}
22:endfor

23:

P​\(y=k∣xj\)←softmaxk​\(−dk\)P\(y\{=\}k\\mid x\_\{j\}\)\\leftarrow\\mathrm\{softmax\}\_\{k\}\(\-d\_\{k\}\)
24:

LKPN\(t\)←LKPN\(t\)−log⁡P​\(y=yj∣xj\)L^\{\(t\)\}\_\{\\mathrm\{KPN\}\}\\leftarrow L^\{\(t\)\}\_\{\\mathrm\{KPN\}\}\-\\log P\(y\{=\}y\_\{j\}\\mid x\_\{j\}\)
25:endfor

26:

θ←OptimizerStep​\(θ,∇θLKPN\(t\)\)\\theta\\leftarrow\\mathrm\{OptimizerStep\}\\\!\\left\(\\theta,\\nabla\_\{\\theta\}L^\{\(t\)\}\_\{\\mathrm\{KPN\}\}\\right\)⊳\\trianglerightUpdate embedding parameters

27:endfor

### 3\.5Computational Complexity

The Kalman Prototypical Network improves the standard prototypical framework by providing statistically optimal estimates under Gaussian noise assumptions\. This comes with an additional cost that arises solely from the recursive update of KPN prototypes per episode and per class\. Letdddenote the embedding dimensionality,KKthe number of classes per episode, andTTthe total number of training episodes\. Each update requires matrix addition and inversion inℝd×d\\mathbb\{R\}^\{d\\times d\}, resulting in complexity𝒪​\(d3\)\\mathcal\{O\}\(d^\{3\}\)per class per episode\. The total overhead for allKKclasses is𝒪​\(K​d3\)\\mathcal\{O\}\(Kd^\{3\}\)per episode, which is negligible for moderatedd\. In comparison, standard Prototypical Networks involve computing class means fromNNsupport samples with complexity𝒪​\(N​d\)\\mathcal\{O\}\(Nd\)and query\-class distances forMMqueries with complexity𝒪​\(M​K​d\)\\mathcal\{O\}\(MKd\)Thus, the dominant computational cost remains in embedding and distance calculations, and the added filtering step introduces minimal complexity\. The episodic filtered prototype variance is given by

σ~k2=1T​∑t=1T‖𝐩^k\(t\)−𝐩~k‖22,𝐩~k=1T​∑t=1T𝐩^k\(t\)\\tilde\{\\sigma\}\_\{k\}^\{2\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}\\left\\\|\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\-\\tilde\{\\mathbf\{p\}\}\_\{k\}\\right\\\|\_\{2\}^\{2\},\\quad\\tilde\{\\mathbf\{p\}\}\_\{k\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}\(14\)In general, the Kalman filter reduces the variance:σ~k2≤σk2,∀k\\tilde\{\\sigma\}\_\{k\}^\{2\}\\leq\\sigma\_\{k\}^\{2\},\\;\\forall k\. This reduction improves temporal smoothness and inter\-episode consistency of class representations, which in turn enhances generalization performance\. The Kalman filter yields the maximum a posteriori estimate:

𝐩^k\(t\)=arg⁡min𝐩^⁡‖𝐩^−𝐩k\(t\)‖𝐑−12\+‖𝐩^−𝐩^k\(t−1\)‖𝐐−12\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\)\}=\\arg\\min\_\{\\hat\{\\mathbf\{p\}\}\}\\left\\\|\\hat\{\\mathbf\{p\}\}\-\\mathbf\{p\}\_\{k\}^\{\(t\)\}\\right\\\|\_\{\\mathbf\{R\}^\{\-1\}\}^\{2\}\+\\left\\\|\\hat\{\\mathbf\{p\}\}\-\\hat\{\\mathbf\{p\}\}\_\{k\}^\{\(t\-1\)\}\\right\\\|\_\{\\mathbf\{Q\}^\{\-1\}\}^\{2\}\(15\)thus, the Kalman filter in KPN acts as a Bayesian smoother, optimally denoising the prototype under Gaussian assumptions\.

## 4Experimental Setup

![Refer to caption](https://arxiv.org/html/2606.26710v1/x4.png)Figure 3:Schematic of the steam cycle as implemented in the dynamic model\[[3](https://arxiv.org/html/2606.26710#bib.bib45),[26](https://arxiv.org/html/2606.26710#bib.bib1)\]\.### 4\.1Dataset

In a previous work by the authors\[[3](https://arxiv.org/html/2606.26710#bib.bib45)\], we generated five time series datasets corresponding to the development of a leak at the OTSG steam header, utilizing the dynamic simulation model\[[26](https://arxiv.org/html/2606.26710#bib.bib1)\]\. A schematic of the simulated CCGT system is shown in Figure[3](https://arxiv.org/html/2606.26710#S4.F3)\. The main input and output variables of the dynamic model relevant for this work are summarized in Tables[1](https://arxiv.org/html/2606.26710#S4.T1)and[2](https://arxiv.org/html/2606.26710#S4.T2)\. For each time series, the steam cycle and OTSGs are subject to operation under the normal variability of the connected GTs, with load oscillations around the nominal point and step changes corresponding to different operating nominal loads\. The normal variability is based on historical operational data for mechanical drive GTs during a year that were previously analyzed for a reference offshore platform in\[[27](https://arxiv.org/html/2606.26710#bib.bib9)\]\. Each of the five time series has a duration of 1 week, with a sampling resolution of 1 hour\. Three of the time series correspond to normal operation \(without OTSG leaks\), and the remaining two time series correspond to the gradual development of a leak in one of the OTSGs\. Faulty data was simulated by increasing the size of the leak over time \(orifice opening\), resulting in an increasing mass flow rate of water/steam through the leak over time\.

Table 1:Input parameters for dataset generation\[[3](https://arxiv.org/html/2606.26710#bib.bib45)\]\.Table 2:Output parameters from dataset generation\[[3](https://arxiv.org/html/2606.26710#bib.bib45)\]\.
### 4\.2Baseline Methods

To evaluate the performance of the proposed method, we considered five state\-of\-the\-art few\-shot learning approaches\.

- •Prototypical Network\[[38](https://arxiv.org/html/2606.26710#bib.bib38)\]employs a neural network\-based encoder and classifies queries by negative Euclidean distance to per\-class prototypes\.
- •Matching Network\[[43](https://arxiv.org/html/2606.26710#bib.bib31)\]uses the same encoder but computes cosine‐normalized embeddings and attends over support examples via softmax similarity\.
- •Relation Network\[[39](https://arxiv.org/html/2606.26710#bib.bib30)\]concatenates each query–support pair of embeddings and passes them through a relation module to predict similarity scores\.
- •MAML\[[20](https://arxiv.org/html/2606.26710#bib.bib33)\]meta‐trains an MLP classifier by repeatedly sampling tasks, performing inner\-loop SGD updates on support sets, and outer\-loop meta\-updates\.

### 4\.3Implementation Details and Tools

The combined dataset is split into an 80% training set and a 20% test set, stratified by the binary label\. All features are standardized to zero mean and unit variance using a scaler fitted on the training data\. For the neural network–based encoders \(Prototypical, Matching, Relation\), we use one hidden layer of dimension 8 and an embedding layer of dimension 4\. The Relation Network’s relation module is a two‐layer MLP with hidden size 16\. For MAML, we perform 30 inner‐loop gradient steps with an inner learning rate of 0\.1 and a single update with an outer learning rate of 0\.01\. For the proposed method \(KPN\), we set process noiseq=10−3q=10^\{\-3\}and observation noiser=10−2r=10^\{\-2\}\. All models are trained for 50 episodes using the Adam optimizer\. All models are trained for 50 episodes using the Adam optimizer\. Models are implemented using PyTorch and trained on NVIDIA RTX A5000 GPUs\.

## 5Results and Discussions

### 5\.1Fault Detection Performance

#### 5\.1\.1Few\-shot Performance

We evaluate the proposed Kalman Prototypical Network \(KPN\) with four established few\-shot learning baselines—Prototypical Network, Matching Network, Relation Network, and MAML \(via Reptile optimization\)—on the gas turbine fault detection task under a 2\-way classification setting\. As shown in Table[3](https://arxiv.org/html/2606.26710#S5.T3), models are evaluated across a range of support set sizes \(4\-shot to 8\-shot\) using 100 test episodes and 20 random experiments\. Reported results are the mean accuracy and standard deviation over all runs\. Across all shot counts, KPN consistently achieves the highest accuracy with the lowest variance, outperforming all baselines\. In the 4\-shot setting, KPN achieves 90\.51% ± 2\.01%, which is approximately 1\.3% higher than MatchingNet and nearly 5% higher than ProtoNet\. The performance gains are particularly evident in lower\-shot regimes, where baselines exhibit high sensitivity to prototype variance and small\-sample noise\. As shown in figure[4](https://arxiv.org/html/2606.26710#S5.F4), the prototypical network suffers from relatively high variance \(e\.g\., ±5\.00% in 6\-shot\), attributed to unstable prototype estimation across episodes\. Matching Network improves robustness via attention over support embeddings but remains susceptible to support\-query mismatch\. RelationNet performs significantly worse than other methods due to the challenges of learning a parametric relation module from limited support data, exhibiting both low accuracy and high variance\. MAML demonstrates moderate accuracy but lacks stability, likely due to the difficulty of optimizing task\-specific weights with very few support examples\. By contrast, KPN stabilizes class representations through temporal smoothing of prototypes, yielding semantically consistent latent centroids across training episodes\. This leads to superior generalization and robustness in few\-shot inference, particularly under challenging low\-data conditions\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x5.png)Figure 4:Accuracy vs\. query per class for fixed test episodes and number of shots\.Table 3:Accuracy vs\. number of support set for a fixed test episodes and query per class\.
#### 5\.1\.2Query size Performance

We further evaluate the models for a fixed few\-shot size, and the number of query samples per class is varied from 5 to 25\. Table[4](https://arxiv.org/html/2606.26710#S5.T4)presents the classification performance of four baselines and the proposed method\. All models are evaluated under a fixed 5\-shot configuration, using 100 test episodes and 20 random seeds\. Across all query sizes, KPN consistently outperforms the baselines, achieving accuracies above 90% with significantly lower variance\. Notably, KPN achieves its best performance at 15 queries per class, with 90\.71%±\\pm2\.75%, and maintains strong stability across other settings\. The minimal degradation in performance as the query size increases demonstrates KPN’s robustness and reliability under varying evaluation loads\. ProtoNet displays moderate improvement with increasing query size, but its performance remains 3–5% below that of KPN\. This can be attributed to its reliance on per\-episode prototypes, which introduces variability and reduces consistency in prediction\. MatchingNet performs better than ProtoNet, especially at lower query counts, due to its attention\-based architecture, yet still falls short of KPN in both accuracy and variance\. RelationNet performs the worst across all query sizes, with high variability and accuracies ranging from 61\.06% to 71\.71%\. Its learned similarity module likely struggles to generalize from few\-shot support examples\. MAML demonstrates moderate performance but suffers from high variance, a consequence of instability in meta\-learned adaptation when limited data is available per task\. The low standard deviation across all settings indicates that KPN achieves both high accuracy and robustness\. Such stability is essential for industrial diagnostic systems, such as gas turbine fault detection, where batch size during deployment may depend on operational constraints or data streaming conditions\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x6.png)Figure 5:Accuracy vs\. query per class for fixed test episodes and number of shots\.Table 4:Accuracy vs\. query per class set for a fixed test episodes and few\-shot samples

### 5\.2Parameter Sensitivity Analysis

#### 5\.2\.1Process Noise and Measurement Noise

We analyze the classification accuracy of KPN under varying levels of process noiseq∈\{10−5,10−4,10−3,10−2,10−1\}q\\in\\\{10^\{\-5\},10^\{\-4\},10^\{\-3\},10^\{\-2\},10^\{\-1\}\\\}, with a fixed measurement noiser=10−3r=10^\{\-3\}, for both 4\-shot and 6\-shot configurations\. Each data point represents the mean accuracy across 100 evaluation episodes and 20 different random experiments\. As shown in Figure[6](https://arxiv.org/html/2606.26710#S5.F6), KPN achieves peak performance at a moderate process noise level, specificallyq=10−3q=10^\{\-3\}, attaining an accuracy of 91\.09% \(±1\.90%\) for the 4\-shot case and 91\.25% \(±2\.15%\) for the 6\-shot case\. This indicates that an intermediate value ofqqeffectively balances the trade\-off between adaptability and temporal smoothing of class prototypes\. At very low process noise values \(e\.g\.,q=10−5q=10^\{\-5\}\), the filter becomes overly conservative, causing the prototypes to remain rigid and less responsive to changes, leading to reduced accuracy: 89\.08% \(4\-shot\) and 88\.58% \(6\-shot\)\. Conversely, excessively high process noise \(e\.g\.,q=10−1q=10^\{\-1\}\) leads to an overly reactive filter, which diminishes its smoothing capability and results in noisier prototypes and degraded performance\.

Figure[7](https://arxiv.org/html/2606.26710#S5.F7)illustrates the effect of varying measurement noiserron classification accuracy under two support set sizes, with process noise fixed atq=10−3q=10^\{\-3\}\. For the 4\-shot setting, accuracy initially increases asrrrises from10−510^\{\-5\}to10−310^\{\-3\}, peaking at 91\.09% withr=10−3r=10^\{\-3\}\. This trend suggests that mild measurement noise helps regularize the updates in the Kalman filter, smoothing over spurious variations in per\-episode prototypes\. However, further increasingrrto10−210^\{\-2\}and10−110^\{\-1\}leads to a degradation in performance, falling back to 90\.23% and 88\.97% respectively\. This decline implies that excessive measurement uncertainty downweights the contribution of new prototype observations, resulting in under\-adaptive behavior\. In the 6\-shot case, a similar pattern is observed\. Accuracy increases from 90\.09% atr=10−5r=10^\{\-5\}to a peak of 91\.25% atr=10−3r=10^\{\-3\}, before declining as noise increases further\. Notably, the 6\-shot setting consistently outperforms 4\-shot across all noise levels, reaffirming the value of additional support examples in stabilizing prototype estimation\. The empirically optimal range for this dataset is centered aroundr=10−3r=10^\{\-3\}\. In general, KPN demonstrates robustness to the choice of process noise within a reasonable range\. However, these results highlight the importance of tuning the noise parameters for optimal performance of temporally regularized few\-shot models\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x7.png)Figure 6:Accuracy vs\. process noise for fixed test episodes and query per class\.![Refer to caption](https://arxiv.org/html/2606.26710v1/x8.png)Figure 7:Accuracy vs\. measurement noise for fixed test episodes and query per class\.To further evaluate the robustness of KPN, we performed a parameter sensitivity analysis with respect to the number of support examples \(shots\) and query examples per class\. Specifically, we varied the number of shots from 4 to 10 and the query size per class from 5 to 20\. The corresponding average classification accuracy across different configurations is visualized in Figure[8](https://arxiv.org/html/2606.26710#S5.F8)\. The heatmap reveals several notable patterns in the effect of shot and query size\. Increasing the number of support examples generally leads to improved classification accuracy\. This trend aligns with common few\-shot learning intuition, where larger support sets provide better prototype estimation\. Interestingly, the highest performance is observed even with as few as 4 support examples, underscoring the effectiveness of the smoothed prototype mechanism in KPN\. The model also demonstrates consistent performance across a wide range of query sizes, with only slight variations in accuracy\. Although a moderate increase in query size improves performance \(due to more stable training gradients\), the benefit diminishes beyond 10 queries per class, indicating that KPN is robust to changes in this parameter\. In general, these results validate the stability and data efficiency of KPN in industrial fault detection tasks\. The model remains performant across varying few\-shot configurations, making it highly suitable for real\-world diagnostic systems where labeled data may be limited\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x9.png)Figure 8:Average accuracy as a function of support size \(shots\) and query size per class\.

### 5\.3Test Episode Performance

We further evaluate the performance of KPN under varying numbers of test episodes \(50, 100, 200, and 500\)\. Figure[9](https://arxiv.org/html/2606.26710#S5.F9)shows the few\-shot classification accuracy as a function of the number of support samples per class \(k\-shot\)\. For all test episode counts, KPN consistently outperforms PN across the entire range of shot values\. These results further reinforce that Kalman smoothing of prototypes not only improves convergence during training but also leads to more reliable and stable generalization under various test\-time conditions\. This property is especially important for real\-world gas turbine fault detection, where operational constraints may limit the number of available labeled or evaluation samples\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x10.png)Figure 9:Accuracy vs\. shots for different test episodes\.
### 5\.4Kalman Prototype Trajectory

We start our analysis by comparing the Kalman\-filtered prototype with the noisy, scattered prototype standard trajectory during episodic training\. Figure[10](https://arxiv.org/html/2606.26710#S5.F10)presents the trajectories of class prototypes projected onto the first two principal components of the embedding space during 1000 episodic training\. The observed prototype trajectories exhibit considerable variability and non\-smoothness, indicative of episodic noise and high intra\-class variance induced by few\-shot support sampling\. To mitigate the instability, a Kalman filter was applied independently to each class\-specific prototype sequence \(blue for Class 0 and red for Class 1\)\. The filtering process utilized a process noise covarianceq=10−3q=10^\{\-3\}and observation noise covariancer=10−2r=10^\{\-2\}, corresponding to moderate model confidence in temporal smoothness relative to observation noise\. The application of Kalman filtering yields two key improvements: the prototype trajectories become smoother, and the filtered trajectories maintain distinct separation between classes along the principal component axes, suggesting that discriminative structure in the latent space is enhanced\. These results empirically validate the hypothesis that filtering reduces episodic variance and stabilizes class representations, enhancing robustness against stochasticity of few\-shot training and improving fault classification boundaries\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x11.png)Figure 10:Prototypes Trajectory During Training\. C0 and C1 denote classes with leak fault and without leak fault, respectively\.
### 5\.5Training Convergence

We train the proposed model using a Kalman\-filtered prototype for each episode\. Figure[11](https://arxiv.org/html/2606.26710#S5.F11)presents the training loss convergence of the standard prototypical network and the Kalman prototypical network over 1000 episodes\. Both models are trained with identical few\-shot settings\. The PN curve exhibits significant oscillations throughout training, reflecting prototype instability due to episodic sampling noise\. In contrast, KPN achieves smoother and more stable loss dynamics\. Early in training \(episodes 0–300\), KPN converges faster and maintains lower loss, indicating improved robustness against embedding drift\. Overall, the training loss analysis confirms that integrating Kalman filtering into the prototypical network framework accelerates convergence in early training and reduces episodic variance—properties that are crucial for reliable few\-shot learning in fault detection applications\.

![Refer to caption](https://arxiv.org/html/2606.26710v1/x12.png)Figure 11:Training loss vs\. iteration on the dataset\.

## 6Conclusion and Future Works

In this work, we proposed the Kalman Prototypical Network \(KPN\) for few\-shot fault detection in gas turbines\. KPN is a robust prototypical network that integrates Kalman filtering to stabilize prototype estimation during episodic few\-shot learning\. Motivated by the observation that class prototypes evolve dynamically and noisily during training, we model prototype evolution as a latent stochastic process and apply temporal filtering to obtain denoised and temporally consistent prototypes\. Through extensive experiments on a gas turbine fault detection dataset, we demonstrated that KPN consistently outperforms the standard Prototypical Network across a range of evaluation settings\. Visualization of prototype trajectories revealed that KPN produces smoother and more stable class representations over training episodes\. Training loss analysis showed that KPN reduces convergence noise and achieves faster and more stable optimization\. Few\-shot accuracy evaluations confirmed that KPN improves generalization across varying numbers of support shots, query sizes, and testing conditions, with particularly significant gains under low\-shot and low\-query regimes\. Future work will explore adaptive filtering strategies, extend the approach to multiclass imbalanced settings, and investigate joint optimization of filter parameters alongside the embedding network\.

## References

- \[1\]A\. Ajami and M\. Daneshvar\(2012\-12\)Data driven approach for fault detection and diagnosis of turbine in thermal power plant using Independent Component Analysis \(ICA\)\.International Journal of Electrical Power & Energy Systems43\(1\),pp\. 728–735\.External Links:[Document](https://dx.doi.org/10.1016/J.IJEPES.2012.06.022),ISSN 0142\-0615Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[2\]J\. M\. Barrera, A\. Reina, A\. Mate, and J\. C\. Trujillo\(2022\-10\)Fault detection and diagnosis for industrial processes based on clustering and autoencoders: a case of gas turbines\.International Journal of Machine Learning and Cybernetics13\(10\),pp\. 3113–3129\.External Links:[Link](https://link.springer.com/article/10.1007/s13042-022-01583-x),[Document](https://dx.doi.org/10.1007/S13042-022-01583-X/TABLES/6),ISSN 1868808XCited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[3\]M\. A\. Belay, L\. F\. Bernardino, A\. Rasheed, R\. M\. Montañés, and P\. Salvo Rossi\(2026\)Unsupervised leak detection for heat recovery steam generators in combined\-cycle gas and steam turbine power plants\.IEEE Sensors Journal26\(1\),pp\. 652–664\.Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p3.1),[Figure 3](https://arxiv.org/html/2606.26710#S4.F3),[Figure 3](https://arxiv.org/html/2606.26710#S4.F3.3.2),[§4\.1](https://arxiv.org/html/2606.26710#S4.SS1.p1.1),[Table 1](https://arxiv.org/html/2606.26710#S4.T1),[Table 1](https://arxiv.org/html/2606.26710#S4.T1.9.2),[Table 2](https://arxiv.org/html/2606.26710#S4.T2),[Table 2](https://arxiv.org/html/2606.26710#S4.T2.19.2)\.
- \[4\]M\. A\. Belay, S\. S\. Blakseth, A\. Rasheed, and P\. Salvo Rossi\(2023\-06\)Unsupervised Anomaly Detection for IoT\-Based Multivariate Time Series: Existing Solutions, Performance Analysis and Future Directions\.Sensors23\(5\),pp\. 2844\.External Links:[Document](https://dx.doi.org/10.3390/S23052844),ISSN 1424\-8220Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p2.1)\.
- \[5\]M\. A\. Belay, A\. Haghipour, A\. Rasheed, and P\. Salvo Rossi\(2026\)Agentic and llm\-based multimodal anomaly detection: architectures, challenges, and prospects\.Sensors26\(8\)\.External Links:[Link](https://www.mdpi.com/1424-8220/26/8/2330),ISSN 1424\-8220Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p2.1)\.
- \[6\]M\. A\. Belay, A\. Rasheed, and P\. Salvo Rossi\(2024\-06\)MTAD: Multiobjective Transformer Network for Unsupervised Multisensor Anomaly Detection\.IEEE Sensors Journal24\(12\),pp\. 20254–20265\.External Links:[Document](https://dx.doi.org/10.1109/JSEN.2024.3396690),ISSN 15581748Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p2.1)\.
- \[7\]M\. A\. Belay, A\. Rasheed, and P\. Salvo Rossi\(2024\)Multivariate Time Series Anomaly Detection via Low\-Rank and Sparse Decomposition\.IEEE Sensors Journal24\(21\),pp\. 34942–34952\.External Links:[Document](https://dx.doi.org/10.1109/JSEN.2024.3452318)Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[8\]M\. A\. Belay, A\. Rasheed, and P\. Salvo Rossi\(2025\)Autoregressive Density Estimation Transformers for Multivariate Time Series Anomaly Detection\.ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing \- Proceedings\.External Links:ISBN 9798350368741,[Document](https://dx.doi.org/10.1109/ICASSP49660.2025.10888728),ISSN 15206149Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[9\]M\. A\. Belay, A\. Rasheed, and P\. Salvo Rossi\(2025\-03\)Digital Twin Knowledge Distillation for Federated Semi\-Supervised Industrial IoT DDoS Detection\.2025 IEEE Symposium on Computational Intelligence in Security, Defence and Biometrics Companion \(CISDB Companion\),pp\. 1–5\.External Links:[Link](https://ieeexplore.ieee.org/document/11010678/),ISBN 979\-8\-3315\-0847\-0,[Document](https://dx.doi.org/10.1109/CISDBCOMPANION65092.2025.11010678)Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[10\]M\. A\. Belay, A\. Rasheed, and P\. Salvo Rossi\(2025\)Digital Twin\-Based Federated Transfer Learning for Anomaly Detection in Industrial IoT\.2025 IEEE Symposium on Computational Intelligence on Engineering/Cyber Physical Systems, CIES 2025\.External Links:ISBN 9798331508272,[Document](https://dx.doi.org/10.1109/CIES64955.2025.11007631)Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[11\]M\. A\. Belay, A\. Rasheed, and P\. Salvo Rossi\(2025\)Sparse Non\-Linear Vector Autoregressive Networks for Multivariate Time Series Anomaly Detection\.IEEE Signal Processing Letters32\(\),pp\. 331–335\.External Links:[Document](https://dx.doi.org/10.1109/LSP.2024.3520019)Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[12\]M\. A\. Belay, A\. Rasheed, and P\. Salvo Rossi\(2026\)Digital twin\-driven communication\-efficient federated anomaly detection for industrial iot\.External Links:2601\.01701,[Link](https://arxiv.org/abs/2601.01701)Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p2.1)\.
- \[13\]S\. Camporeale, L\. Dambrosio, A\. Milella, M\. Mastrovito, and B\. Fortunato\(2009\-02\)Fault Diagnosis of Combined Cycle Gas Turbine Components Using Feed Forward Neural Networks\.American Society of Mechanical Engineers, International Gas Turbine Institute, Turbo Expo \(Publication\) IGTI1,pp\. 549–561\.External Links:[Link](https://dx.doi.org/10.1115/GT2003-38742),[Document](https://dx.doi.org/10.1115/GT2003-38742)Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[14\]Case studies on HP economizer tube leaks at Bouchain CCGT – Combined Cycle Journal\.External Links:[Link](https://www.ccj-online.com/case-studies-on-hp-economizer-tube-leaks-at-bouchain-ccgt/)Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[15\]V\. Chandola, A\. Banerjee, and V\. Kumar\(2009\)Anomaly detection: A survey\.ACM Computing Surveys41\(3\)\.External Links:[Link](http://doi.acm.org/10.1145/1541880.1541882),[Document](https://dx.doi.org/10.1145/1541880.1541882),ISSN 03600300Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p2.1)\.
- \[16\]M\. A\. Chao, C\. Kulkarni, K\. Goebel, and O\. Fink\(2019\-12\)Hybrid deep fault detection and isolation: Combining deep neural networks and system performance models\.International Journal of Prognostics and Health Management10\(4\)\.External Links:[Link](https://papers.phmsociety.org/index.php/ijphm/article/view/2621),[Document](https://dx.doi.org/10.36001/IJPHM.2019.V10I4.2621),ISSN 2153\-2648Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[17\]H\. E\. Davallo, R\. Bahrevar, and A\. Chaibakhsh\(2019\-11\)Fault diagnosis of Combined Cycle Power Plant Using ELM\.ICRoM 2019 \- 7th International Conference on Robotics and Mechatronics,pp\. 40–45\.External Links:ISBN 9781728166049,[Document](https://dx.doi.org/10.1109/ICROM48714.2019.9071851)Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[18\]A\. T\. W\. K\. Fahmi, K\. Reza Kashyzadeh, and S\. Ghorbani\(2024\-05\)Advancements in Gas Turbine Fault Detection: A Machine Learning Approach Based on the Temporal Convolutional Network–Autoencoder Model\.Applied Sciences 2024, Vol\. 14, Page 455114\(11\),pp\. 4551\.External Links:[Link](https://www.mdpi.com/2076-3417/14/11/4551/htm%20https://www.mdpi.com/2076-3417/14/11/4551),[Document](https://dx.doi.org/10.3390/APP14114551),ISSN 2076\-3417Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[19\]M\. Fast and T\. Palmé\(2010\-02\)Application of artificial neural networks to the condition monitoring and diagnosis of a combined heat and power plant\.Energy35\(2\),pp\. 1114–1120\.External Links:[Document](https://dx.doi.org/10.1016/J.ENERGY.2009.06.005),ISSN 0360\-5442Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[20\]C\. Finn, P\. Abbeel, and S\. Levine\(2017\)Model\-agnostic meta\-learning for fast adaptation of deep networks\.InProceedings of the 34th International Conference on Machine Learning,pp\. 1126–1135\.Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1),[4th item](https://arxiv.org/html/2606.26710#S4.I1.i4.p1.1)\.
- \[21\]M\. A\. Gonzalez\-Salazar, T\. Kirsten, and L\. Prchlik\(2018\-02\)Review of the operational flexibility and emissions of gas\- and coal\-fired power plants in a future with growing renewables\.Renewable and Sustainable Energy Reviews82,pp\. 1497–1513\.External Links:[Link](https://www.sciencedirect.com/science/article/pii/S1364032117309206),[Document](https://dx.doi.org/10.1016/J.RSER.2017.05.278),ISSN 1364\-0321Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p1.1)\.
- \[22\]C\. Li, S\. Li, Y\. Feng, K\. Gryllias, F\. Gu, and M\. Pecht\(2024\-07\)Small data challenges for intelligent prognostics and health management: a review\.Artificial Intelligence Review 2024 57:857\(8\),pp\. 1–52\.External Links:[Link](https://link.springer.com/article/10.1007/s10462-024-10820-4),ISBN 0123456789,[Document](https://dx.doi.org/10.1007/S10462-024-10820-4),ISSN 1573\-7462Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[23\]X\. Liang, M\. Zhang, G\. Feng, D\. Wang, Y\. Xu, and F\. Gu\(2023\-10\)Few\-Shot Learning Approaches for Fault Diagnosis Using Vibration Data: A Comprehensive Review\.Sustainability 2023, Vol\. 15, Page 1497515\(20\),pp\. 14975\.External Links:[Link](https://www.mdpi.com/2071-1050/15/20/14975/htm%20https://www.mdpi.com/2071-1050/15/20/14975),[Document](https://dx.doi.org/10.3390/SU152014975),ISSN 2071\-1050Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[24\]X\. LIU, Y\. CHEN, L\. XIONG, J\. WANG, C\. LUO, L\. ZHANG, and K\. WANG\(2024\-04\)Intelligent fault diagnosis methods toward gas turbine: A review\.Chinese Journal of Aeronautics37\(4\),pp\. 93–120\.External Links:[Link](https://www.sciencedirect.com/science/article/pii/S1000936123003357?utm_source=chatgpt.com),[Document](https://dx.doi.org/10.1016/J.CJA.2023.09.024),ISSN 1000\-9361Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[25\]M\. J\. Mazzetti, B\. A\. L\. Hagen, G\. Skaugen, K\. Lindqvist, S\. Lundberg, and O\. A\. Kristensen\(2021\)Achieving 50% weight reduction of offshore steam bottoming cycles\.Energy230,pp\. 120634\.External Links:[Link](https://www.sciencedirect.com/science/article/pii/S0360544221008835),[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.energy.2021.120634),ISSN 0360\-5442Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p1.1)\.
- \[26\]R\. M\. Montañés, M\. Windfeldt, L\. E\. Andersson, and G\. Skaugen\(2023\-03\)A framework for physics based off\-design and dynamic modelling and simulation of combined cycle power plants in weight and volume constraint environments\.InHeat Powered Cycles Conference 2023,External Links:[Link](https://zenodo.org/records/10245219)Cited by:[Figure 3](https://arxiv.org/html/2606.26710#S4.F3),[Figure 3](https://arxiv.org/html/2606.26710#S4.F3.3.2),[§4\.1](https://arxiv.org/html/2606.26710#S4.SS1.p1.1)\.
- \[27\]R\. M\. Montañés, G\. Skaugen, B\. Hagen, and D\. Rohde\(2021\-06\)Compact Steam Bottoming Cycles: Minimum Weight Design Optimization and Transient Response of Once\-Through Steam Generators\.Frontiers in Energy Research9,pp\. 687248\.External Links:[Link](https://arxiv.org/html/2606.26710v1/www.frontiersin.org),[Document](https://dx.doi.org/10.3389/FENRG.2021.687248/BIBTEX),ISSN 2296598XCited by:[§4\.1](https://arxiv.org/html/2606.26710#S4.SS1.p1.1)\.
- \[28\]M\. R\. Nayeri, B\. Nadjar Araabi, and B\. Moshiri\(2022\-11\)Fault detection and isolation of gas turbine: Hierarchical classification and confidence rate computation\.Journal of the Franklin Institute359\(17\),pp\. 10120–10144\.External Links:[Link](https://www.sciencedirect.com/science/article/pii/S0016003222007001?utm_source=chatgpt.com),[Document](https://dx.doi.org/10.1016/J.JFRANKLIN.2022.09.056),ISSN 0016\-0032Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[29\]P\. Patil, B\. Srinivasan, and R\. Srinivasan\(2018\-01\)Process Fault Detection in Heat Recovery Steam Generator using an Artificial Neural Network Simplification of a Dynamic First Principles Model\.Computer Aided Chemical Engineering44,pp\. 2065–2070\.External Links:[Document](https://dx.doi.org/10.1016/B978-0-444-64241-7.50339-6),ISSN 1570\-7946Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[30\]B\. Pourbabaee, N\. Meskin, and K\. Khorasani\(2016\-07\)Sensor Fault Detection, Isolation, and Identification Using Multiple\-Model\-Based Hybrid Kalman Filter for Gas Turbine Engines\.IEEE Transactions on Control Systems Technology24\(4\),pp\. 1184–1200\.External Links:[Document](https://dx.doi.org/10.1109/TCST.2015.2480003),ISSN 10636536Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[31\]L\. Qiao, Y\. Zhang, Q\. Wang, D\. Li, and S\. Peng\(2025\-04\)Fault diagnosis for wind turbine generators based on Model\-Agnostic Meta\-Learning: A few\-shot learning method\.Expert Systems with Applications267,pp\. 126171\.External Links:[Link](https://www.sciencedirect.com/science/article/pii/S0957417424030380),[Document](https://dx.doi.org/10.1016/J.ESWA.2024.126171),ISSN 0957\-4174Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p2.1)\.
- \[32\]S\. Ravi and H\. Larochelle\(2017\)Optimization as a model for few\-shot learning\.InProceedings of the 5th International Conference on Learning Representations,Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[33\]Z\. Ren, T\. Lin, K\. Feng, Y\. Zhu, Z\. Liu, and K\. Yan\(2023\)A Systematic Review on Imbalanced Learning Methods in Intelligent Fault Diagnosis\.IEEE Transactions on Instrumentation and Measurement72\.External Links:[Document](https://dx.doi.org/10.1109/TIM.2023.3246470),ISSN 15579662Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[34\]Z\. Ren, Y\. Zhu, Z\. Liu, and K\. Feng\(2023\)Few\-Shot GAN: Improving the Performance of Intelligent Fault Diagnosis in Severe Data Imbalance\.IEEE Transactions on Instrumentation and Measurement72\.External Links:[Document](https://dx.doi.org/10.1109/TIM.2023.3271746),ISSN 15579662Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p2.1)\.
- \[35\]S\. Sampath, A\. Gulati, and R\. Singh\(2002\-02\)Fault Diagnostics Using Genetic Algorithm for Advanced Cycle Gas Turbine\.American Society of Mechanical Engineers, International Gas Turbine Institute, Turbo Expo \(Publication\) IGTI2 A,pp\. 19–27\.External Links:[Link](https://dx.doi.org/10.1115/GT2002-30021),[Document](https://dx.doi.org/10.1115/GT2002-30021)Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[36\]A\. Santoro, S\. Bartunov, M\. Botvinick, D\. Wierstra, and T\. Lillicrap\(2016\)Meta\-learning with memory\-augmented neural networks\.InProceedings of the 33rd International Conference on International Conference on Machine Learning \- Volume 48,ICML’16,pp\. 1842–1850\.Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[37\]U\. Sarwar, M\. Muhammad, A\. A\. Mokhtar, R\. Khan, P\. Behrani, and S\. Kaka\(2024\-03\)Hybrid intelligence for enhanced fault detection and diagnosis for industrial gas turbine engine\.Results in Engineering21,pp\. 101841\.External Links:[Document](https://dx.doi.org/10.1016/J.RINENG.2024.101841),ISSN 2590\-1230Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p2.1)\.
- \[38\]J\. Snell, K\. Swersky, and R\. S\. Zemel\(2017\)Prototypical Networks for Few\-shot Learning\.InAdvances in Neural Information Processing Systems,Vol\.30\.Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1),[§3\.1](https://arxiv.org/html/2606.26710#S3.SS1.p1.13),[1st item](https://arxiv.org/html/2606.26710#S4.I1.i1.p1.1)\.
- \[39\]F\. Sung, Y\. Yang, L\. Zhang, T\. Xiang, P\. H\. Torr, and T\. M\. Hospedales\(2018\)Learning to compare: Relation network for few\-shot learning\.InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp\. 1199–1208\.Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1),[3rd item](https://arxiv.org/html/2606.26710#S4.I1.i3.p1.1)\.
- \[40\]R\. S\. Surase, R\. Konijeti, and R\. P\. Chopade\(2024\-12\)Thermal performance analysis of gas turbine power plant using soft computing techniques: a review\.Engineering Applications of Computational Fluid Mechanics18\(1\),pp\. 2374317\.External Links:[Link](https://www.tandfonline.com/doi/pdf/10.1080/19942060.2024.2374317),[Document](https://dx.doi.org/10.1080/19942060.2024.2374317%3BPAGE%3ASTRING%3AARTICLE/CHAPTER),ISSN 1997003XCited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[41\]G\. Tabella, M\. A\. Belay, I\. Viejo, M\. Herrando, and P\. Salvo Rossi\(2026\)Failure prediction in manufacturing processes via kullback–leibler divergence\.IEEE Sensors Letters10\(1\),pp\. 1–4\.External Links:[Document](https://dx.doi.org/10.1109/LSENS.2025.3641051)Cited by:[§1](https://arxiv.org/html/2606.26710#S1.p2.1)\.
- \[42\]W\. Tina, M\. Asme, E\. Donaldson, T\. E\. Dickinson, and W\. H\. Schmidt\(2023\-01\)Failure Analysis of Once\-Through Steam Generator \(OTSG\) Tube\.ASME Open Journal of Engineering2\.External Links:[Link](https://dx.doi.org/10.1115/1.4062769),[Document](https://dx.doi.org/10.1115/1.4062769)Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[43\]O\. Vinyals, C\. Blundell, T\. Lillicrap, K\. Kavukcuoglu, and D\. Wierstra\(2016\)Matching networks for one shot learning\.InAdvances in Neural Information Processing Systems,pp\. 3630–3638\.Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1),[2nd item](https://arxiv.org/html/2606.26710#S4.I1.i2.p1.1)\.
- \[44\]I\. D\. Wijayanti, H\. C\.K\. Agustin, and A\. Hariyadi\(2021\-12\)Failure analysis of the leakage at boiler bottom wall tube of steam power plant\.AIP Conference Proceedings2384\(1\)\.External Links:[Link](https://arxiv.org/aip/acp/article/2384/1/070006/673854/Failure-analysis-of-the-leakage-at-boiler-bottom),ISBN 9780735441743,[Document](https://dx.doi.org/10.1063/5.0071495/673854),ISSN 15517616Cited by:[§2\.1](https://arxiv.org/html/2606.26710#S2.SS1.p1.1)\.
- \[45\]S\. Zhang, F\. Ye, B\. Wang, and T\. Habetler\(2021\-09\)Few\-Shot Bearing Fault Diagnosis Based on Model\-Agnostic Meta\-Learning\.IEEE Transactions on Industry Applications57\(5\),pp\. 4754–4764\.External Links:[Document](https://dx.doi.org/10.1109/TIA.2021.3091958),ISSN 19399367Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p2.1)\.
- \[46\]T\. Zhang, J\. Chen, F\. Li, K\. Zhang, H\. Lv, S\. He, and E\. Xu\(2022\-01\)Intelligent fault diagnosis of machines with small & imbalanced data: A state\-of\-the\-art review and possible extensions\.ISA Transactions119,pp\. 152–171\.External Links:[Link](https://www.sciencedirect.com/science/article/pii/S0019057821001257?via%3Dihub),[Document](https://dx.doi.org/10.1016/J.ISATRA.2021.02.042),ISSN 0019\-0578Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p1.1)\.
- \[47\]T\. Zhang, C\. Guo, Q\. Jia, and X\. Huang\(2024\-11\)Few\-Shot Learning for Abnormal Event Detection in Nuclear Power Plants\.Proceedings of 2024 31st International Conference on Nuclear Engineering, ICONE 20245\.External Links:[Link](https://dx.doi.org/10.1115/ICONE31-134965),ISBN 9780791888254,[Document](https://dx.doi.org/10.1115/ICONE31-134965)Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p2.1)\.
- \[48\]X\. Zheng, C\. Yue, J\. Wei, A\. Xue, M\. Ge, and Y\. Kong\(2023\-12\)Few\-shot intelligent fault diagnosis based on an improved meta\-relation network\.Applied Intelligence53\(24\),pp\. 30080–30096\.External Links:[Link](https://link.springer.com/article/10.1007/s10489-023-05128-9),[Document](https://dx.doi.org/10.1007/S10489-023-05128-9/FIGURES/12),ISSN 15737497Cited by:[§2\.2](https://arxiv.org/html/2606.26710#S2.SS2.p2.1)\.

Similar Articles

CAFD: Concept-Aware DNN Fault Detection using VLMs

arXiv cs.LG

This paper introduces CAFD, a learning-based approach for DNN fault detection that integrates model-based, distance-based, and a novel concept-based feature called Concept Failure Ratio (CFR) derived from Vision-Language Models. CAFD consistently outperforms state-of-the-art baselines in fault detection rate across multiple datasets and budgets.

TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection

arXiv cs.LG

TPA-AD is a two-stage pseudo anomaly-guided method for bearing time-series anomaly detection that generates pseudo-anomalous windows near normal boundaries using reconstruction models and contrastive learning, then scores anomalies with KNN—without requiring real anomaly samples during training. It is evaluated on bearing fault and degradation datasets, including high-speed train axle-box bearing data.