@vintcessun: 时间序列异常检测一直有个恼人的空档：算法丢来一个分数，却从不告诉你“为什么这里异常”？没有解释，用户只能干瞪眼，信任和诊断都无从谈起。ProtoX-AD 终于打破了这层窗户纸——它在自监督分类框架里嵌入可解释的原型向量层，每个原型对应一种…

X AI KOLs Timeline 2026/06/15 01:10 论文

摘要

ProtoX-AD is a prototype-based self-explainable framework for self-supervised time series anomaly detection that provides interpretable explanations for detected anomalies by learning transformation-aware prototypes, achieving performance comparable to black-box methods while offering semantic anomaly characterization.

时间序列异常检测一直有个恼人的空档：算法丢来一个分数，却从不告诉你“为什么这里异常”？没有解释，用户只能干瞪眼，信任和诊断都无从谈起。ProtoX-AD 终于打破了这层窗户纸——它在自监督分类框架里嵌入可解释的原型向量层，每个原型对应一种变换模式或异常特征。训练时让正常样本聚到对应原型附近，异常样本远离所有原型；推理时分类误差用来检测，同时通过原型相似度直接告诉你异常是“局部突变”还是“趋势偏移”。检测精度不牺牲，解释还带着语义。边界是原型需预设，但首次补齐了可解释性这块关键缺失。

查看原文

查看缓存全文

缓存时间: 2026/06/15 13:05

ProtoX-AD: Self-Explainable Time Series Anomaly Detection and Characterization

Source: https://arxiv.org/html/2606.13277 Elisabeth Wetzer111These authors contributed equally to this work.Kristoffer Wickstrøm222These authors contributed equally to this work.Michael Kampffmeyer333These authors contributed equally to this work.Robert Jenssen444These authors contributed equally to this work.

Abstract

Recent advances in time series anomaly detection (TSAD) have highlighted the effectiveness of self-supervised classification-based approaches. These methods apply transformations to normal training samples, training a classifier to recognize transformation-specific patterns that help identify anomalies through increased classification errors. Despite their strong performance, a significant challenge is their lack of explainability, as they provide limited insight into the characteristics of flagged anomalies. To address this limitation, we propose ProtoX-AD, a prototype-based self-explainable framework for self-supervised TSAD. ProtoX-AD learns transformation-aware latent representations alongside interpretable prototypes, enabling both accurate anomaly detection and the identification of distinct anomalous profiles through prototype-based explanations. Additionally, it allows for systematic analysis of how transformation design impacts detection performance and explainability. Experimental results on synthetic and real-world datasets demonstrate that ProtoX-AD achieves detection performance comparable to its black-box counterparts while offering more consistent and semantically meaningful explanations than existing explainable baselines. Our code is publicly available athttps://github.com/Aitorzan3/ProtoX-AD

keywords:

Time series anomaly detection , anomaly characterization , self-supervised learning , explainable AI , prototype learning.

\affiliation

[1] organization=Department of Computer Science and Artificial Intelligence, University of the Basque Country UPV/EHU, city=Donostia, country=Spain

\affiliation

[2]organization=Department of Physics and Technology, UiT The Arctic University of Norway, city=Tromsø, country=Norway

1Introduction

Time series anomalies are abnormal events in which the behavior of a system departs from its regular operating patterns[5]. Time series anomaly detection (TSAD) is a fundamental problem in many critical application domains, such as finance[15], Internet of Things (IoT) systems[8], and healthcare[31]. Due to the high cost of labeling anomalies, TSAD is typically addressed using unsupervised approaches[2,34]. Within this setting, models learn the patterns of normal data to detect deviations from normality at inference time, typically quantified as anomaly scores[7].

Recent advances in the field have focused on self-supervised learning[23], with classification-based approaches emerging as a prominent paradigm[27]. In this setting, a set of transformations is applied to normal samples to generate one augmented view per transformation, and a classifier is trained to predict the transformation associated with each view[17]. As a result, models capture the expected response of normal samples when subjected to each transformation. At inference time, anomalies tend to violate these learned patterns, leading to higher classification errors that are used as anomaly scores. The effectiveness of self-supervised classification-based time series anomaly detection (SSC-TSAD) critically depends on the choice of transformations[32].

Despite their strong performance, these approaches lack explainability[19], limiting the understanding of both how transformation design influences detection performance and how different anomalous patterns are characterized. In particular, existing methods typically provide anomaly scores without offering insight into the underlying structure of anomalous behavior. Consequently, they do not support the identification and interpretation of distinct anomalous profiles, which is critical in many real-world applications[10].

To address this limitation, in this paper we propose a self-explainable framework for SSC-TSAD555In this work, we focus on detecting entire anomalous time series as opposed to detecting anomalous points or subsequences.. We draw inspiration from prototypical self-explainable models (SEMs) in the field of eXplainable AI (XAI)[1], which learn representative concepts in the latent space that can be visualized in the input space to explain model decisions[6,13]. By structuring the latent space around transformation-induced concepts and reconstructing prototypes in the input domain, our method enables both accurate anomaly detection and the characterization of distinct anomalous profiles. Moreover, it enables a systematic analysis of the impact of transformation design on SSC-TSAD. We call our method ProtoX-AD.

In summary, the main contributions of this work are as follows:

1.We propose ProtoX-AD, a prototype-based self-explainable framework for self-supervised classification-based time series anomaly detection.
2.Our method achieves anomaly detection performance comparable to its black-box self-supervised counterpart, while providing interpretable prototype-based explanations that enable anomaly characterization.
3.We provide an analysis of the impact of transformation design on SSC-TSAD in terms of detection performance, explainability, and anomaly characterization.

2Related Work on SSC-TSAD

Self-supervised classification-based TSAD methods rely on the application of a set of transformations to normal samples, generating augmented views that define a surrogate classification task[14]. The effectiveness of these approaches depends strongly on the choice of transformations, as they determine the structure of the learned representations and the types of deviations that can be detected. Specifically, recent work states that they should generate corrupted views of normal samples that reflect the properties of the targeted anomalies[32].

Following this perspective, most existing approaches rely on manually designed transformations tailored to specific anomaly detection problems based on domain knowledge[32]. For instance, transformations such as upscaling have been used in water leak detection to mimic increased consumption patterns caused by leaks[3]. Similarly, various works propose transformations that alter the amplitude and frequency of sequences to mirror the properties of seizures for epilepsy detection[35,30]. Although effective, relying on hand-crafted transformations limits the general applicability and transferability of these methods across TSAD problems[27].

To alleviate this limitation, recent works have explored learnable neural transformations[26,28], implemented as neural networks that are learned directly from normal training data. By leveraging contrastive learning[18], these transformations are enforced to generate diverse and non-redundant augmented views while preserving the semantic content of the original samples and disrupting data normality in a controlled manner, prioritizing general applicability over domain-specific assumptions[28].

The choice between manual and neural transformations often depends on the availability of domain knowledge: manually designed transformations can target specific anomalous patterns, whereas neural transformations aim to provide greater generality[27]. Existing work typically compares these approaches in terms of anomaly detection performance, highlighting the trade-off between domain specificity and generality. However, this perspective does not capture how different transformation choices influence the characterization of anomalous behavior. In particular, understanding how transformed views relate to distinct anomalous patterns requires insight beyond a single anomaly score. Existing methods lack explicit mechanisms to support such analysis[19].

3Method

ProtoX-AD learns a latent space in which augmented views are well suited for SSC-TSAD alongside prototypes that allow the distinction and characterization of different anomalous profiles through prototypical explainability.

3.1Model Architecture

Our model’s pipeline consists of five key components: (i) a transformation module, (ii) a feature extraction module, (iii) a dual reconstruction module, (iv) a prototype module, and (v) a classification module. Figure1illustrates the overall architecture of ProtoX-AD.

Refer to caption Figure 1:Pipeline of ProtoX-AD. The input time series𝐱i\mathbf{x}_{i}is transformed into augmented views𝐱ik\mathbf{x}_{i}^{k}, encoded into latent representations𝐳ik\mathbf{z}_{i}^{k}, and compared with class-specific prototypesϕmk\boldsymbol{\phi}_{m}^{k}to compute similarity matrices𝐒ik\mathbf{S}_{i}^{k}. The classification module performs self-supervised classification based on these similarities. In addition, the dual reconstruction module reconstructs both transformed views and the original sample from the latent representations.##### Transformation module

This comprises a set ofKKtransformations𝒯={T1,…,TK}\mathcal{T}=\{T_{1},...,T_{K}\}, which can be manual or learnable neural transformations representing parameterized functions. When applied to an input sample𝐱i\mathbf{x}_{i}, these transformations generate a set of augmented views associated with the original sample𝒯(𝐱i)={T1(𝐱i),…,TK(𝐱i)}={𝐱i1,…,𝐱iK}\mathcal{T}(\mathbf{x}_{i})=\{T_{1}(\mathbf{x}_{i}),...,T_{K}(\mathbf{x}_{i})\}=\{\mathbf{x}_{i}^{1},...,\mathbf{x}_{i}^{K}\}, where𝐱ik\mathbf{x}_{i}^{k}denotes the augmented view that is generated by applying thekk-th transformationTk(⋅)T_{k}(\cdot)to that sample. Following the existing literature on self-supervised classification for anomaly detection[27], the first transformationT1(⋅)T_{1}(\cdot)in the transformation module is set to the identity function, i.e.,T1(𝐱i)=𝐱iT_{1}(\mathbf{x}_{i})=\mathbf{x}_{i}.

Feature extraction module

This consists of an encoderf(⋅)f(\cdot)that maps the augmented views into a new representation that is suitable for the self-supervised anomaly detection task. Following the variational autoencoder paradigm and inspired by ProtoVAE[11], ProtoX-AD models the latent space probabilistically to encourage smooth and structured latent representations. Specifically, each augmented view𝐱ik\mathbf{x}_{i}^{k}is transformed byf(⋅)f(\cdot)into a tuple (𝝁ik\boldsymbol{\mu}_{i}^{k},𝝈ik\boldsymbol{\sigma}_{i}^{k}), which are the parameters of the posterior normal distribution𝒩\mathcal{N}(𝝁ik\boldsymbol{\mu}_{i}^{k},𝝈ik\boldsymbol{\sigma}_{i}^{k}), from where its corresponding latent representation𝐳ik\mathbf{z}_{i}^{k}is sampled.

Dual reconstruction module

It consists of an explainere(⋅)e(\cdot)and a semantic preservation decoderg(⋅)g(\cdot), which are designed to be symmetric to the encoderf(⋅)f(\cdot). The explainere(⋅)e(\cdot)reconstructs the augmented views of each sample𝐱ik\mathbf{x}_{i}^{k}from their corresponding latent representations𝐳ik\mathbf{z}_{i}^{k}, while the semantic preservation decoderg(⋅)g(\cdot)reconstructs the original sample𝐱i\mathbf{x}_{i}from each latent view𝐳ik\mathbf{z}_{i}^{k}. That is, ideallye(𝐳ik)=𝐱ike(\mathbf{z}_{i}^{k})=\mathbf{x}_{i}^{k}andg(𝐳ik)=𝐱ig(\mathbf{z}_{i}^{k})=\mathbf{x}_{i}, fork∈{1,2,…,K}k\in\{1,2,\dots,K\}.

At a high level, the explainere(⋅)e(\cdot)enables the visualization and interpretation of latent prototypes in the input space, while the semantic preservation decoderg(⋅)g(\cdot)acts as a regularizer that preserves semantic consistency across transformations and prevents representation collapse.

Prototype module

Our network incorporates a set of prototypes that are learned to capture representative concepts associated with each class and to enable the self-supervised classification task. Specifically, we considerMMprototypes for each class, resulting in a set of prototypes𝚽={ϕmk}m=1..Mk=1..K\boldsymbol{\Phi}=\{\boldsymbol{\phi}_{m}^{k}\}_{m=1..M}^{k=1..K}, whereϕmk\boldsymbol{\phi}_{m}^{k}is themm-th prototype associated with classkk. For each augmented view in the latent space𝐳ik\mathbf{z}_{i}^{k}, we define a matrix of distances𝐃ik∈ℝK×M\mathbf{D}_{i}^{k}\in\mathbb{R}^{K\times M}, whose entrydik(c,m)d_{i}^{k}(c,m)corresponds to the squared Euclidean distance between𝐳ik\mathbf{z}_{i}^{k}and themm-th prototype of classcc:

dik(c,m)=∥𝐳ik−ϕmc∥22.d_{i}^{k}(c,m)=\lVert\mathbf{z}_{i}^{k}-\boldsymbol{\phi}_{m}^{c}\rVert_{2}^{2}.(1) The prototype vectors are initialized using K-means clustering in the latent space. Specifically, prior to training, an initial forward pass of the training data through the transformation module and encoder is performed. Then, K-means withMMclusters is applied independently per transformation class to the resulting latent representations to obtain an initial set of latent prototypes.

Classification module

To ensure transparency, the self-supervised classification task is performed by directly leveraging the concepts learned in the model’s latent space. For each augmented view with latent representation𝐳ik\mathbf{z}_{i}^{k}, we convert its associated distance matrix𝐃ik\mathbf{D}_{i}^{k}into a similarity matrix𝐒ik\mathbf{S}_{i}^{k}by taking its negative, i.e.,𝐒ik=−𝐃ik\mathbf{S}_{i}^{k}=-\mathbf{D}_{i}^{k}. Accordingly, the entries of the similarity matrix are defined as

sik(c,m)=−dik(c,m),s_{i}^{k}(c,m)=-d_{i}^{k}(c,m),(2) wheresik(c,m)s_{i}^{k}(c,m)quantifies the similarity between the latent representation𝐳ik\mathbf{z}_{i}^{k}and themm-th prototype of classcc.

Following[6], the resulting similarities are then fed into a linear classifierh(⋅)h(\cdot)to compute the self-supervised classification predictions for each augmented view, yielding an output𝐲^ik=h(𝐒ik)\hat{\mathbf{y}}_{i}^{k}=h(\mathbf{S}_{i}^{k})that ideally assigns the corresponding transformation labelkk, fork∈{1,2,…,K}k\in\{1,2,\dots,K\}.

3.2Training ProtoX-AD

ProtoX-AD is trained using a dataset of unlabeled time series samples𝒳={𝐱i}i=1N\mathcal{X}=\{\mathbf{x}_{i}\}_{i=1}^{N}, where (ideally) all the samples belong to the class designated as normal. The overall loss function of the model is defined as:

ℒProtoX-AD=ℒclass+ℒrecon+ℒproto.\mathcal{L}_{\textnormal{ProtoX-AD}}=\mathcal{L}_{\textnormal{class}}+\mathcal{L}_{\textnormal{recon}}+\mathcal{L}_{\textnormal{proto}}.(3) We now detail each term.

Classification through prototypes

This learning objective enables the model to distinguish the transformations considered in the transformation module, thereby learning the self-supervised classification task. We employ the standard cross-entropy loss defined as

ℒclass=−1NK∑i=1N∑k=1K𝐲iklog⁡(𝐲^ik),\mathcal{L}_{\textnormal{class}}=-\dfrac{1}{NK}\sum_{i=1}^{N}\sum_{k=1}^{K}\mathbf{y}_{i}^{k}\log(\hat{\mathbf{y}}_{i}^{k}),(4) where𝐲ik∈{0,1}K\mathbf{y}_{i}^{k}\in\{0,1\}^{K}is the corresponding self-supervised one-hot label withyik(k)=1y_{i}^{k}(k)=1and0otherwise.

The latent representations associated with different transformations are separated in the latent space, promoting transformation diversity. Since the transformation associated with the first class is defined as the identity function, this objective further promotes the generation of augmented views that disrupt data normality, as transformed views are encouraged to deviate from the latent structure of normal samples.

Dual reconstruction training objective

The dual reconstruction training objective is designed to support SSC-TSAD by jointly structuring the latent space around transformation-specific concepts and preserving the semantic identity of the original samples. This learning objective is composed of three complementary terms.

The first term enables the explainere(⋅)e(\cdot)to reconstruct each augmented view from its latent representation by minimizing:

ℒexp=1NK∑i=1N∑k=1K‖e(𝐳ik)−𝐱ik‖1.\mathcal{L}_{\textnormal{exp}}=\frac{1}{NK}\sum_{i=1}^{N}\sum_{k=1}^{K}\left\lVert e(\mathbf{z}_{i}^{k})-\mathbf{x}_{i}^{k}\right\rVert_{1}.(5) The second term enforces the latent representations of augmented views to preserve the semantic information of their associated original samples through the semantic preservation decoderg(⋅)g(\cdot):

ℒsem=1NK∑i=1N∑k=1K‖g(𝐳ik)−𝐱i‖1.\mathcal{L}_{\textnormal{sem}}=\frac{1}{NK}\sum_{i=1}^{N}\sum_{k=1}^{K}\left\lVert g(\mathbf{z}_{i}^{k})-\mathbf{x}_{i}\right\rVert_{1}.(6) In addition to reconstruction-based objectives, we further structure the latent space around transformation-specific concepts by introducing a prototype-centered latent regularization inspired by the mixture-based formulation of[11]:

ℒmix=1NK∑i=1N∑k=1K∑m=1Mwik(m)DKL(𝒩(𝝁ik,𝝈ik)∥𝒩(ϕmk,𝐈d)),\mathcal{L}_{\textnormal{mix}}=\frac{1}{NK}\sum_{i=1}^{N}\sum_{k=1}^{K}\sum_{m=1}^{M}w_{i}^{k}(m)\,D_{KL}\!\left(\mathcal{N}(\boldsymbol{\mu}_{i}^{k},\boldsymbol{\sigma}_{i}^{k})\,\|\,\mathcal{N}(\boldsymbol{\phi}_{m}^{k},\mathbf{I}_{d})\right),(7) whereddis the dimensionality of the latent space,𝐈d\mathbf{I}_{d}denotes thed×dd\times didentity matrix,DKLD_{KL}is the Kullback–Leibler divergence,𝒩\mathcal{N}denotes a multivariate normal distribution, and the mixture weightswik(m)w_{i}^{k}(m)are defined as

wik(m)=exp⁡(sik(k,m))∑m′=1Mexp⁡(sik(k,m′)).w_{i}^{k}(m)=\frac{\exp\!\left(s_{i}^{k}(k,m)\right)}{\sum_{m^{\prime}=1}^{M}\exp\!\left(s_{i}^{k}(k,m^{\prime})\right)}. The overall training objective of the dual reconstruction module is given by

ℒrecon=ℒexp+ℒsem+ℒmix.\mathcal{L}_{\textnormal{recon}}=\mathcal{L}_{\textnormal{exp}}+\mathcal{L}_{\textnormal{sem}}+\mathcal{L}_{\textnormal{mix}}.(8) This combined objective grounds the latent space in the input domain by jointly enforcing complementary constraints. The explainer and semantic reconstruction terms preserve input-level fidelity and latent intra-class diversity across augmented views, respectively. In turn, the mixture-based regularization aligns latent representations with class-specific prototypes, stabilizing prototype learning and discouraging degenerate solutions. Together, these components ensure that the learned latent space remains both structured and semantically interpretable.

Meaningful prototype learning

To encourage meaningful and well-supported prototypes, we adopt two complementary objectives: a clustering loss and a prototype coverage loss.

Inspired by[6], the clustering loss encourages the latent representations of each augmented view to be close to at least one prototype related to its class, and it is defined as:

ℒclst=1NK∑i=1N∑k=1Kminm∈{1,…,M}⁡dik(k,m).\mathcal{L}_{\textnormal{clst}}=\frac{1}{NK}\sum_{i=1}^{N}\sum_{k=1}^{K}\min_{m\in\{1,\dots,M\}}d_{i}^{k}(k,m).(9) Complementarily, the prototype coverage loss follows the intuition of[21]and ensures that each prototype is close to at least one augmented view associated with its class in the latent space. It is defined as:

ℒcov=1KM∑k=1K∑m=1Mmini∈{1,…,N}⁡dik(k,m).\mathcal{L}_{\textnormal{cov}}=\frac{1}{KM}\sum_{k=1}^{K}\sum_{m=1}^{M}\min_{i\in\{1,\dots,N\}}d_{i}^{k}(k,m).(10) The overall objective for meaningful prototype learning is then given by

ℒproto=ℒclst+ℒcov.\mathcal{L}_{\textnormal{proto}}=\mathcal{L}_{\textnormal{clst}}+\mathcal{L}_{\textnormal{cov}}.(11) This combined objective enforces a meaningful prototypical structure in the latent space. The clustering loss promotes compactness of latent representations around class-specific prototypes, thereby aligning augmented views with representative concepts. In addition, the coverage loss prevents the emergence of unused or degenerate prototypes, a phenomenon often referred to asghosting[12].

3.3Anomaly Detection with ProtoX-AD

After training, when evaluating a new sample𝐱new\mathbf{x}_{\textnormal{new}}, ProtoX-AD encodes the identity augmented view of the sample into the latent space using the feature extraction module. The anomaly score of the new sampleAS(𝐱new)\textnormal{AS}(\mathbf{x}_{\textnormal{new}})is defined as the cross-entropy loss associated with its self-supervised classification:

AS(𝐱new)=−𝐲new1log⁡(𝐲^new1),\textnormal{AS}(\mathbf{x}_{\textnormal{new}})=-\mathbf{y}_{\textnormal{new}}^{1}\log(\hat{\mathbf{y}}_{\textnormal{new}}^{1}),(12) where𝐲^new1\hat{\mathbf{y}}_{\textnormal{new}}^{1}denotes the model prediction for the identity view of𝐱new\mathbf{x}_{\textnormal{new}}, and𝐲new1\mathbf{y}_{\textnormal{new}}^{1}is the one-hot pseudo-label corresponding to the identity transformation (i.e., classk=1k=1).

The underlying rationale is that the model is trained to associate each transformation with a specific class and to capture the corresponding transformation-induced patterns in the latent space. When evaluating a new sample, if its identity view is classified as a different transformation, this indicates that the sample exhibits characteristics that resemble those induced by that transformation. Consequently, a high cross-entropy value for the identity view reflects a semantic mismatch between the sample and the learned normal transformation-specific patterns, and therefore a higher likelihood that the sample is anomalous.

3.4Explanation through Prototypes

After training, the set of learned prototypes𝚽\boldsymbol{\Phi}can be visualized in the input space by decoding them through the explainer networke(⋅)e(\cdot). These decoded prototypes represent global concepts associated with the different classes defined by the transformations considered in the transformation module of ProtoX-AD.

To explain the anomaly score computation of a new sample𝐱new\mathbf{x}_{\textnormal{new}}, we anchor the explanation to its identity view, which provides a canonical and human-interpretable representation of the sample. The explanation is obtained by identifying the nearest prototype to the identity view in the latent space and visualizing it in the input space as a representative concept capturing the normal or anomalous characteristics of the sample. In addition, the diversity induced by the transformation design enables the characterization of different anomalous profiles, as anomalous samples can be associated with prototypes corresponding to distinct transformation-specific concepts.

3.5ProtoX-AD is a SEM

A model can be considered self-explainable if it fulfills three key properties: (i)transparency, where concepts are directly used to perform the downstream task and are visualizable in the input space; (ii)diversity, where concepts represent non-overlapping information in the latent space; and (iii)trustworthiness, where performance matches that of a comparable black-box model and explanations are consistent, i.e., similar inputs yield similar explanations[11].

ProtoX-AD satisfies these properties by explicitly embedding interpretable concepts into its learning objective. It is transparent, as anomaly detection is performed through prototype-based classification, and prototypes are reconstructed in the input space using the explainer network. It promotes diversity by learning multiple prototypes per transformation-induced class that capture distinct variations of the underlying signal. Finally, ProtoX-AD is trustworthy: it achieves competitive performance with respect to its black-box counterpart (see Section5), and the structured latent space, enforced by the VAE formulation, ensures that similar samples are mapped to similar prototypes, yielding consistent explanations.

4Experimental Setup

In this section, we describe the experimental setup used to evaluate both the anomaly detection performance and the explainability capabilities of ProtoX-AD across different datasets.

4.1Datasets and problem definition

We consider three TSAD problems based on synthetic and real-world time series data.

UMD Dataset

This dataset consists of synthetic time series sharing a common baseline structure with a central plateau. It comprises three classes: one class without bell-shaped patterns and two classes containing upward and downward bell-shaped events, respectively. Sequences with bell-shaped patterns are treated as anomalous samples, where these may appear either at the beginning or end, and the central plateau may appear inverted (see Figure2). Predefined training and evaluation splits follow[9].

Following the unsupervised TSAD setting, the training set is restricted to normal samples only, while the evaluation set contains both normal and anomalous samples.

Refer to caption Figure 2:Representative normal (blue) and anomalous (red) time series from the UMD dataset.

Global Temperature Anomalies Dataset (GTA)

The dataset consists of monthly-sampled time series derived from global mean surface temperature anomaly data obtained from the GISTEMP[20]and gcag[24]sources. Each time series corresponds to a single year of the deviation of the monthly global temperature with respect to a reference climatological baseline, where positive values indicate warmer-than-average conditions and negative values indicate colder-than-average conditions. Each year is associated with an overall annual temperature anomaly, ranging approximately between−1.5-1.5and+1.5+1.5degrees.

Years with an annual temperature anomaly within the range[−0.25,0.25][-0.25,0.25]are treated as normal while remaining years are considered anomalous. We use80%80\%of the normal samples for training, and the remaining20%20\%of normal samples and anomalous samples for evaluation. In our experiments, the GISTEMP and gcag sources are treated as two independent TSAD problems and evaluated separately. Figure3illustrates representative normal and anomalous yearly temperature anomaly time series from the GISTEMP source.

Refer to caption Figure 3:Representative normal (blue) and anomalous (red) yearly temperature anomaly time series from the GISTEMP source.

Yorkshire Water Leak Detection Dataset

This is a real-world dataset containing water flow measurements across more than 2000 Distribution Management Areas (DMAs) in Yorkshire. We construct one time series per DMA and day, capturing the minimum night flow (MNF) behavior, and restrict our analysis to five DMAs that do not contain invalid or missing MNF values and exhibit consistent measurement quality, following the experimental protocol of[3]. We consider DMAs 549, 913, 1164, 1406, and 1259.

Anomalies are defined based on percentile thresholds computed over the MNF values, following[3]. Percentile values of 80, 85, 90, and 95 are considered, leading to four experimental settings. Each combination of DMA, weekday, and percentile threshold defines an independent TSAD problem, resulting in a total of 140 TSAD problems. For each,80%80\%of the normal samples are used for training, and the remaining20%20\%of normal samples and all anomalous samples constitute the evaluation set. Figure4shows representative examples.

Refer to caption Figure 4:Representative normal (blue) and anomalous (red) water flow sequences from the Yorkshire Water Leak Detection dataset.

4.2Baselines

We compare ProtoX-AD against representative baselines from three methodological families: (i) shallow anomaly detection methods, (ii) a black-box self-supervised deep learning method, and (iii) an explainable self-supervised method.

Shallow methods

We use three popular anomaly detection methods: Isolation Forest (IF)[22], One-Class SVM (OCSVM)[29], and Local Outlier Factor (LOF)[4]. They operate directly on the input data without learned feature extraction, serving as non-deep baselines.

Black-box self-supervised method

We use ProtoX-AD but without the explainability mechanism. The model is trained to jointly perform self-supervised classification using a linear classifier operating on the learned representations, and to reconstruct the original samples from their augmented views to preserve semantic information. This baseline serves as the non-explainable counterpart of ProtoX-AD and provides an upper bound on anomaly detection performance.

Explainable self-supervised method

KMEx[12]serves as an explainable self-supervised baseline, a prototype-based method that builds upon a self-supervised classification black-box model. KMEx induces prototypes by K-means clustering in the latent space of a pre-trained encoder, and explains predictions by associating each input with its nearest prototype. We apply KMEx on top of the self-supervised classification-based black-box model described above, making it directly comparable to ProtoX-AD in terms of both anomaly detection performance and explainability.

Anomaly scores in KMEx are computed as the cross-entropy of the classification prediction for the identity view, following ProtoX-AD. Explanations are obtained by the nearest prototype in the latent space and visualizing the corresponding representative sample in the input space, selected from the augmented views of the normal training samples.

Further implementation details of the proposed baselines and ProtoX-AD are presented inB.

4.3Transformation design for self-supervised anomaly detection

Transformation design is a key component of self-supervised anomaly detection, as it determines the semantic meaning of the augmented views and the resulting latent representations. We consider two complementary approaches.

Manually defined transformations

We design dataset-specific transformation sets that reflect the underlying anomaly structure, leveraging domain knowledge and prior work when available. This ensures alignment between transformation-induced classes and target anomalies. Specifically, in UMD transformations generate bell-shaped patterns, in GTA they simulate warmer and colder temperature anomalies, and in the Yorkshire dataset they mimic leak-related patterns. Full details of all transformations for each dataset are provided inA.

Learnable neural transformations

We consider learnable neural transformations modeled as convolutional neural networks, following the approach proposed in NeuTraL AD[26]. These transformations preserve the dimensionality of the input data, allowing augmented views to be visualized in the original input space. They are learned in an anomaly-agnostic manner, without incorporating prior knowledge about anomaly structure.

4.4Evaluation protocol

The area under the receiver operating characteristic curve (AUROC) and the area under the precision–recall curve (AUPR)[28]are used for anomaly detection performance evaluation. AUROC assesses the ability to discriminate between normal and anomalous samples across decision thresholds, while AUPR is particularly informative in highly imbalanced settings, as it measures the trade-off between precision and recall. Higher values are better.

Explanation quality is assessed both qualitatively, through visual inspection of the explanations, and quantitatively by measuring the similarity between each test sample and its assigned explanation in the input space using Mean Absolute Error (MAE) and Mean Squared Error (MSE).

All experiments are repeated five times using different random seeds, and results are reported as mean and standard deviation.

5Experimental Results

This section presents a comprehensive evaluation of ProtoX-AD in terms of anomaly detection performance and explainability. For ProtoX-AD, we consider both manually designed transformations and learnable neural transformations to analyze the impact of transformation design on both detection performance and explanation quality.

5.1Evaluation of anomaly detection performance

Tables1,2, and3summarize the anomaly detection performance in terms of AUROC and AUPR for the UMD, GTA, and Yorkshire datasets, respectively. Results are averaged over random seeds for UMD, reported separately for the GISTEMP and gcag sources for GTA, and aggregated over seeds, days, and DMAs across four percentile thresholds for Yorkshire.

Manual Transformations

ProtoX-AD with manually designed transformations achieves strong overall performance across datasets and evaluation metrics. It consistently outperforms shallow methods, including Isolation Forest, LOF, and One-Class SVM, in most cases. Compared to the explainable self-supervised baseline KMEx, which also relies on manually designed transformations, ProtoX-AD attains comparable performance on UMD, slightly improves results on the GISTEMP source of the GTA dataset and across all settings of the Yorkshire dataset, while showing lower performance on the gcag source of GTA. Furthermore, it matches or slightly improves the performance of black-box self-supervised methods, remaining competitive across all percentile thresholds of the Yorkshire dataset.

Neural Transformations

The use of learnable neural transformations leads to a consistent drop in performance across datasets and evaluation metrics compared to ProtoX-AD with manually designed transformations. While this variant remains competitive with shallow methods on the UMD dataset, it generally underperforms them, particularly on the GTA and Yorkshire datasets. This degradation is consistent across all datasets and percentile thresholds, and is accompanied by increased variability across runs.

Table 1:UMD results (average±\pmstd over seeds).MT: Manual Transformations; NT: Neural Transformations

Table 2:Global Temperature Anomalies results for sources GISTEMP and gcag (average±\pmstd over seeds).MT: Manual Transformations; NT: Neural Transformations

Table 3:Yorkshire Water Leak Detection results for different percentile values (mean±\pmstd over seeds, days, and DMAs).MT: Manual Transformations; NT: Neural Transformations

5.2Evaluation of explainability

We evaluate the quality of the explanations provided by ProtoX-AD compared with KMEx from both qualitative and quantitative perspectives, focusing on prototype-based explanations.

5.2.1Qualitative evaluation

We analyze the learned prototypes and the resulting explanations for representative samples. Figure5illustrates the prototypes learned by each method for the transformation-induced classes considered in the UMD dataset. In addition, Figure6presents representative explanations, showing test samples alongside the corresponding explanations assigned by each method. For visualization purposes, we consider a representative subset of four out of the nine classes in this dataset and display one prototype per transformation-induced class.

Refer to caption

Figure 5:Learned prototypes for the UMD dataset. Columns represent transformation-induced classes. Rows correspond to ProtoX-AD with manually designed transformations (MT), KMEx (also based on manually defined transformations), and ProtoX-AD with neural transformations (NT), respectively. Colors denote different classes, with blue indicating the identity (normal) class. Refer to caption

Refer to caption

Figure 6:Prototype-based explanations for representative test samples from the UMD dataset. Columns correspond to transformation-induced classes. Rows represent ProtoX-AD with manually designed transformations (MT), KMEx (manual transformations), and ProtoX-AD with neural transformations (NT), respectively. Blue curves denote test samples, while orange curves denote the corresponding prototype-based explanations.##### Manual Transformations

The learned prototypes provide meaningful representations of the transformation-induced classes for both ProtoX-AD and KMEx in the UMD dataset. Each prototype corresponds to a transformation-induced pattern defined by the transformation module: normal plateau behavior (identity), bell-shaped upward deviations at the beginning and end of the sequence without inversion, and bell-shaped downward deviations with inversion of the central plateau. These prototypes therefore capture not only the presence of a bell-shaped perturbation, but also its position and orientation, which define distinct anomalous profiles in this dataset. Generally, ProtoX-AD prototypes appear smoother and reflect the overall structure of each class, whereas KMEx prototypes are sharper and exhibit more localized variations (e.g., more abrupt and pronounced peaks in the bell-shaped patterns).

Both ProtoX-AD and KMEx provide informative prototype-based explanations for representative normal and anomalous samples. The first sample is associated with an identity prototype, correctly reflecting normal plateau behavior. The remaining samples, which exhibit bell-shaped deviations, are matched with prototypes corresponding to the relevant transformation-induced classes, including upward deviations at the beginning or end of the sequence and downward deviations with inversion of the central plateau. These examples show that the explanations are consistent with the learned prototypes and capture the underlying transformation-induced concepts.

Neural Transformations

While the prototype associated with the normal plateau exhibits a clear and stable structure, the remaining prototypes do not resemble meaningful anomaly patterns. Instead of capturing coherent bell-shaped variations or structured deviations of the signal, they exhibit noisy and unstructured variations with no clear correspondence to the transformation-induced classes.

Regarding the explanations, the lack of structure in the learned prototypes is reflected in the resulting explanations. The nearest prototypes assigned to anomalous samples do not consistently correspond to meaningful transformation-induced concepts. In particular, various anomalous samples are associated with the prototype of the identity (normal) class, indicating that the model fails to distinguish between normal and anomalous patterns in terms of the learned prototypes.

Additional qualitative examples for the GISTEMP problem of the GTA dataset are provided inCto further analyze the intra-class diversity of the learned prototypes.

5.2.2Quantitative evaluation

To complement the qualitative analysis, Tables4,5, and6report MAE and MSE between each sample and its corresponding explanation for ProtoX-AD (with both manually defined and neural transformations) and KMEx across the UMD, GTA, and Yorkshire datasets. The evaluation considers the use of different numbers of prototypes, namely 1, 3, 5, and 7.

Table 4:Explanation errors (×100\times 100) in UMD with 1, 3, 5 and 7 prototypes (mean±\pmstd over seeds).MT: Manual Transformations; NT: Neural Transformations.

Table 5:Explanation errors (×100\times 100) in Global Temperature Anomalies with 1, 3, 5 and 7 prototypes (mean±\pmstd over seeds).MT: Manual Transformations; NT: Neural Transformations.

Table 6:Explanation errors (×100\times 100) in Yorkshire Water Leak Detection with 1, 3, 5 and 7 prototypes (mean±\pmstd over seeds, days and DMAs).MT: Manual Transformations; NT: Neural Transformations.

Manual Transformations

Across all datasets and experimental settings, ProtoX-AD achieves lower explanation errors than KMEx, reflecting a closer match between the learned prototypes and the explained samples in the input space. For both methods, increasing the number of prototypes leads to lower errors, as a richer prototype set enables more fine-grained explanations. For any number of prototypes, ProtoX-AD consistently outperforms KMEx.

Neural Transformations

ProtoX-AD with neural transformations exhibits substantially higher explanation errors across all datasets and prototype configurations. This degradation is consistent across all prototype settings, with the neural variant underperforming both ProtoX-AD and KMEx when using manually defined transformations. Although increasing the number of prototypes leads to a slight reduction in error, the gap with manually defined transformations remains large, indicating that a larger prototype set does not compensate for the lack of meaningful structure in the learned transformations.

6Discussion

We discuss the main findings of our experimental evaluation, focusing on three main aspects: the detection performance of explainable models, the comparison between ProtoX-AD and KMEx through their prototype-based explanations, and the impact of transformation design on SSC-TSAD , considering both manually defined and learnable neural transformations.

6.1Detection Performance of Explainable Models

We analyze how incorporating explainability affects anomaly detection performance in the setting with manually defined transformations, comparing explainable methods with their black-box counterpart and evaluating the relative performance of ProtoX-AD and KMEx.

Explainability does not inherently degrade anomaly detection performance

Introducing explicit explainability mechanisms is often assumed to compromise detection performance when compared to black-box models. However, our results show that this effect is not systematic. On the synthetic UMD dataset (Table1), both self-explainable methods achieve perfect detection performance, indicating no degradation. On the GTA dataset (Table2), the impact varies across sources: explainable methods remain competitive with the black-box baseline on GISTEMP, while showing a performance drop on gcag, particularly for ProtoX-AD. In contrast, on the Yorkshire dataset (Table3), ProtoX-AD consistently matches or outperforms the black-box baseline across all settings, whereas KMEx exhibits a more noticeable gap. Overall, these results indicate that incorporating explainability does not inherently degrade detection performance, although its impact depends on the dataset and the specific modeling approach.

ProtoX-AD exhibits more consistent detection performance than KMEx

A direct comparison between the two explainable methods reveals a non-uniform but interpretable pattern. On UMD (Table1), both approaches perform identically due to the simplicity of the anomaly structure. On GISTEMP, ProtoX-AD achieves slightly stronger detection performance, whereas on gcag KMEx shows higher AUROC and AUPR values (Table2. However, on the Yorkshire dataset (Table3), which comprises multiple percentile thresholds and operational scenarios, ProtoX-AD consistently outperforms KMEx across all settings. Taken together, these results indicate that while neither method universally dominates, ProtoX-AD exhibits more consistent performance across datasets and evaluation settings, suggesting that integrating prototypes directly into the learning objective provides a more stable detection mechanism.

6.2Comparison of Explainable Methods through Prototype-Based Explanations

We compare ProtoX-AD and KMEx in terms of the structure and consistency of their prototype-based explanations when using manually defined transformations.

ProtoX-AD learns smoother and more representative class-level prototypes

A qualitative difference can be observed between the prototypes produced by ProtoX-AD and KMEx. ProtoX-AD yields smoother prototypes that capture class-level structure, whereas KMEx prototypes tend to reflect instance-specific variations. This difference stems from how prototypes are constructed: ProtoX-AD learns latent concepts optimized during training and reconstructs them in the input space, while KMEx explanations are obtained by matching individual training instances. Consequently, ProtoX-AD captures more general transformation-induced patterns in its prototypes, whereas KMEx exhibits instance-specific noise. This leads to consistently lower reconstruction errors between samples and their explanations (Tables4,5, and6) in the case of ProtoX-AD. Note that increasing the number of prototypes further allows ProtoX-AD to model finer intra-class variability through more specialized concepts.

ProtoX-AD enforces consistent latent–input alignment

ProtoX-AD exhibits a consistent alignment between latent-space assignments and their corresponding representations in the input space, whereas such alignment is not guaranteed in KMEx. As illustrated in Figure7, the prototype assigned to a sample by ProtoX-AD in the latent space typically coincides with the nearest prototype in the input space. In contrast, KMEx assigns prototypes that do not correspond to the closest input-space concept, leading to discrepancies between latent proximity and semantic similarity.

This behavior is consistent with prior observations[16]that latent similarity does not necessarily imply semantic similarity in prototype-based models. In ProtoX-AD, this alignment is promoted by learning prototype concepts jointly with the model and by the KL-divergence objective, which encourages a coherent organization of the latent space, resulting in more consistent and reliable explanations.

Refer to caption

Figure 7:Alignment between latent assignment and input-space similarity in the UMD dataset. For ProtoX-AD (top row), the assigned prototype coincides with the input-space nearest prototype. In contrast, for KMEx (bottom row), the assigned prototype does not necessarily correspond to the nearest reconstructed concept in the input space.

6.3On the importance of Transformation Design: Manual vs. Neural Transformations

We now analyze the role of transformation design in self-supervised anomaly detection by comparing manually defined and learnable neural transformations, focusing on their impact on both detection performance and the quality of the resulting explanations.

The effectiveness of transformation design depends on its alignment with the anomaly structure

As observed in Section5(see Tables1,2and3), manually defined transformations that align with the underlying anomaly structure lead to strong detection performance. In contrast, replacing them with neural transformations consistently degrades performance across datasets, particularly in more complex scenarios where informative augmented views are harder to learn. This highlights the central role of transformation design in self-supervised anomaly detection.

Domain knowledge in transformation design is essential for anomaly characterization

Beyond detection performance, transformation design also plays a critical role in the interpretability of anomaly explanations. When manually defined transformations are aligned with the underlying anomaly structure, transformation-induced classes correspond to meaningful variations of the signal, enabling the model to associate anomalous samples with interpretable concepts and support anomaly characterization.

In contrast, neural transformations, which are learned in an anomaly-agnostic manner, fail to capture such semantic structure. As a result, the corresponding prototypes do not align with the target anomalous patterns and fail to specialize in distinct anomaly types. This behavior is illustrated in Figure6, where many samples—both normal and anomalous—are matched with prototypes associated with the identity transformation. Consequently, explanations derived from neural transformations do not provide meaningful information for anomaly characterization.

7Conclusion

This work introduces ProtoX-AD, a prototype-based self-supervised framework for explainable time series anomaly detection. By leveraging transformation-induced surrogate classes, the model learns structured representations aligned with interpretable prototypes, enabling anomaly detection together with intrinsic explainability and anomaly characterization.

Experimental results show that incorporating domain knowledge through manually designed transformations is key to achieving strong performance in self-supervised anomaly detection. In this setting, ProtoX-AD achieves detection performance comparable to its black-box counterpart, while outperforming the explainable baseline KMEx and providing more structured and precise explanations. In particular, the learned prototypes capture class-level patterns and yield explanations that are both quantitatively and qualitatively more consistent.

Regarding transformation design, our analysis of manual and neural transformations in SSC-TSAD highlights a trade-off between domain-specific transformation design and learnable augmentation mechanisms. While manual transformations that encode prior knowledge about anomaly structure lead to stronger detection performance and more meaningful explanations, relying on such knowledge may be unrealistic in many real-world scenarios. Conversely, learning neural transformations solely under generic structural constraints is insufficient to produce augmented views that correspond to meaningful anomaly-related variations of the signal, which are necessary both for competitive anomaly detection performance and for the characterization of different anomalous profiles. Future work should therefore explore adaptive approaches that progressively incorporate knowledge about detected anomalies to learn transformations in a more flexible way, enabling both robust detection and meaningful anomaly characterization.

Finally, although this work focuses on time series anomaly detection, the proposed framework is not limited to this setting. By relying on transformation-induced representations and prototype-based learning, ProtoX-AD can be extended to other data modalities. Investigating its applicability across domains constitutes a promising direction for future work.

Acknowledgments

This work was supported by the Research Council of Norway (NFR), through its Centre for Research-based Innovation (grant no. 309439) and FRIPRO (grant nos. 303514 and 360068); and by the Basque Government (grant no. IT1504-22). This publication is part of project PID2022-137442NB-I00 funded by MICIU/AEI/10.13039/501100011033. A. S. F. acknowledges financial support from the Department of Education of the Basque Government (grant no. PRE_2022_1_0103).

References

[1]X. Bai, X. Wang, X. Liu, Q. Liu, J. Song, N. Sebe, and B. Kim(2021)Explainable deep learning for efficient and robust pattern recognition: a survey of recent developments.Pattern Recognition120,pp. 108102.Cited by:§1.
[2]A. Blázquez-García, A. Conde, U. Mori, and J. A. Lozano(2021)A review on outlier/anomaly detection in time series data.ACM Computing Surveys54(3),pp. 1–33.Cited by:§1.
[3]A. Blázquez-García, A. Conde, U. Mori, and J. A. Lozano(2021)Water leak detection using self-supervised time series classification.Information Sciences574,pp. 528–541.Cited by:Appendix A,§2,§4.1,§4.1.
[4]M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander(2000)LOF: identifying density-based local outliers.InProceedings of the 2000 ACM SIGMOD international conference on Management of data,pp. 93–104.Cited by:§4.2.
[5]A. Carreño, I. Inza, and J. A. Lozano(2020)Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework.Artificial Intelligence Review53,pp. 3575–3594.Cited by:§1.
[6]C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su(2019)This looks like that: deep learning for interpretable image recognition.Advances in Neural Information Processing Systems32.Cited by:§1,§3.1,§3.2.
[7]K. Choi, J. Yi, C. Park, and S. Yoon(2021)Deep learning for anomaly detection in time-series data: review, analysis, and guidelines.IEEE Access9,pp. 120043–120065.Cited by:§1.
[8]A. A. Cook, G. Misirli, and Z. Fan(2019)Anomaly detection for IoT time-series data: a survey.IEEE Internet of Things Journal7(7),pp. 6481–6494.Cited by:§1.
[9]H. A. Dau, A. Bagnall, K. Kamgar, C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. Keogh(2019)The UCR time series archive.IEEE/CAA Journal of Automatica Sinica6(6),pp. 1293–1305.Cited by:§4.1.
[10]R. Foorthuis(2021)On the nature and types of anomalies: a review of deviations in data.International Journal of Data Science and Analytics12(4),pp. 297–331.Cited by:§1.
[11]S. Gautam, A. Boubekki, S. Hansen, S. Salahuddin, R. Jenssen, M. Höhne, and M. Kampffmeyer(2022)ProtoVAE: a trustworthy self-explainable prototypical variational model.Advances in Neural Information Processing Systems35,pp. 17940–17952.Cited by:§3.1,§3.2,§3.5.
[12]S. Gautam, A. Boubekki, M. M. Höhne, and M. Kampffmeyer(2024)Prototypical self-explainable models without re-training.Transactions on Machine Learning Research.Note:External Links:ISSN 2835-8856Cited by:§3.2,§4.2.
[13]S. Gautam, M. M. Höhne, S. Hansen, R. Jenssen, and M. Kampffmeyer(2023)This looks more like that: enhancing self-explaining models by prototypical relevance propagation.Pattern Recognition136,pp. 109172.Cited by:§1.
[14]J. Gui, T. Chen, J. Zhang, Q. Cao, Z. Sun, H. Luo, and D. Tao(2024)A survey on self-supervised learning: algorithms, applications, and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence46(12),pp. 9052–9071.Cited by:§2.
[15]W. Hilal, S. A. Gadsden, and J. Yawney(2022)Financial fraud: a review of anomaly detection techniques and recent advances.Expert Systems with Applications193,pp. 116429.Cited by:§1.
[16]A. Hoffmann, C. Fanconi, R. Rade, and J. Kohler(2021)This looks like that… does it? shortcomings of latent space prototype interpretability in deep networks.arXiv preprint arXiv:2105.02968.Cited by:§6.2.
[17]H. Hojjati, T. K. K. Ho, and N. Armanfard(2024)Self-supervised anomaly detection in computer vision and beyond: a survey and outlook.Neural Networks172,pp. 106106.Cited by:§1.
[18]H. Hu, X. Wang, Y. Zhang, Q. Chen, and Q. Guan(2024)A comprehensive survey on contrastive learning.Neurocomputing610,pp. 128645.Cited by:§2.
[19]D. Lee, S. Malacarne, and E. Aune(2024)Explainable time series anomaly detection using masked latent generative modeling.Pattern Recognition156,pp. 110826.Cited by:§1,§2.
[20]N. Lenssen, G. A. Schmidt, M. Hendrickson, P. Jacobs, M. J. Menne, and R. Ruedy(2024)A NASA GISTEMPv4 observational uncertainty ensemble.Journal of Geophysical Research: Atmospheres129(17),pp. e2023JD040179.Cited by:§4.1.
[21]B. Li, C. Jentsch, and E. Müller(2023)Prototypes as explanation for time series anomaly detection.arXiv preprint arXiv:2307.01601.Cited by:§3.2.
[22]F. T. Liu, K. M. Ting, and Z. Zhou(2008)Isolation forest.In2008 eighth IEEE International Conference on Data Mining,pp. 413–422.Cited by:§4.2.
[23]X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, and J. Tang(2021)Self-supervised learning: generative or contrastive.IEEE Transactions on Knowledge and Data Engineering35(1),pp. 857–876.Cited by:§1.
[24]NOAA National Centers for Environmental Information(2026)Climate at a glance: global time series.Note:https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-seriesPublished May 2026, retrieved on May 18 2026Cited by:§4.1.
[25]F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,et al.(2011)Scikit-learn: machine learning in python.Journal of Machine Learning Research12,pp. 2825–2830.Cited by:Appendix B.
[26]C. Qiu, T. Pfrommer, M. Kloft, S. Mandt, and M. Rudolph(2021)Neural transformation learning for deep anomaly detection beyond images.InInternational Conference on Machine Learning,pp. 8703–8714.Cited by:§2,§4.3.
[27]A. Sánchez-Ferrera, B. Calvo, and J. A. Lozano(2025)A review on self-supervised learning in time series anomaly detection: recent advances and open challenges.ACM Computing Surveys58(5),pp. 1–35.Cited by:§1,§2,§2,§3.1.
[28]A. Sánchez-Ferrera, U. Mori, B. Calvo, and J. A. Lozano(2025)NeuCoReClass AD: redefining self-supervised time series anomaly detection.arXiv preprint arXiv:2508.00909.Cited by:§2,§4.4.
[29]B. Schölkopf, R. C. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt(1999)Support vector method for novelty detection.Advances in neural information processing systems12.Cited by:§4.2.
[30]J. Xu, Y. Zheng, Y. Mao, R. Wang, and W. Zheng(2020)Anomaly detection on electroencephalography with self-supervised learning.In2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM),pp. 363–368.Cited by:§2.
[31]X. Yang, X. Qi, and X. Zhou(2023)Deep learning technologies for time series anomaly detection in healthcare: a review.IEEE Access11,pp. 117788–117799.Cited by:§1.
[32]J. Yoo, T. Zhao, and L. Akoglu(2023)Data augmentation is a hyperparameter: cherry-picked self-supervision for unsupervised anomaly detection is creating the illusion of success.Transactions on Machine Learning Research.External Links:ISSN 2835-8856Cited by:Appendix A,§1,§2,§2.
[33]Z. Yue, Y. Wang, J. Duan, T. Yang, C. Huang, Y. Tong, and B. Xu(2022)TS2Vec: towards universal representation of time series.InProceedings of the AAAI Conference on Artificial Intelligence,Vol.36,pp. 8980–8987.Cited by:Appendix B.
[34]Z. Zamanzadeh Darban, G. I. Webb, S. Pan, C. Aggarwal, and M. Salehi(2024)Deep learning for time series anomaly detection: a survey.ACM Computing Surveys57(1),pp. 1–42.Cited by:§1.
[35]Y. Zheng, Z. Liu, R. Mo, Z. Chen, W. Zheng, and R. Wang(2022)Task-oriented self-supervised learning for anomaly detection in electroencephalography.InInternational Conference on Medical Image Computing and Computer-Assisted Intervention,pp. 193–203.Cited by:§2.

Appendix AManual Transformations per dataset

Following prior work[32], we design dataset-specific transformations that generate augmented views that mimic the anomalous behaviors of interest in each problem, together with the identity transformation.

Transformations are stochastically parameterized to introduce variability across augmented views, see Table7.

Table 7:Stochastic parameters and sampling ranges used by the manual transformation modules for each dataset. All parameters are resampled at each forward pass of the transformation module.TransformationStochastic parameterRangeUMD DatasetLocal bump (all variants)Amplitude[0.75,1.25][0.75,\;1.25]Local bump (all variants)Inward temporal shift[15,20][15,\;20]Local bump (all variants)Edge jitter[0,5][0,\;5]Temperature DatasetCold-heavy shiftTarget annual mean[−1.225,−0.80][-1.225,\;-0.80]Cold-light shiftTarget annual mean[−0.70,−0.30][-0.70,\;-0.30]Warm-light shiftTarget annual mean[0.30,0.70][0.30,\;0.70]Warm-heavy shiftTarget annual mean[0.80,1.225][0.80,\;1.225]Yorkshire DatasetLow magnitude scalingMultiplicative factor[1.4,1.7][1.4,\;1.7]Medium magnitude scalingMultiplicative factor[2.0,2.3][2.0,\;2.3]High magnitude scalingMultiplicative factor[2.6,2.9][2.6,\;2.9]

Transformations for UMD

For the UMD dataset, transformations are designed to generate augmented views reflecting the two anomalous classes present in the data: upward and downward bell-shaped patterns. In addition to the identity transformation, four transformations per anomaly type are defined by combining the position of the bell (at the beginning or end of the sequence) and the orientation of the central plateau (preserved or inverted), yielding nine transformations in total. The amplitude of the bell, a small edge jitter, and an inward temporal shift are randomly sampled within predefined ranges to introduce variability while preserving the anomaly structure.

Transformations for Global Temperature Anomalies

For the GTA dataset, transformations consist of global upward or downward shifts applied to normal samples, modifying the overall level of the series to generate colder-than-average and warmer-than-average augmented views consistent with the anomaly definition of the dataset. In addition to the identity transformation, four transformations are defined corresponding to cold-heavy, cold-light, warm-light, and warm-heavy regimes, distinguished by the magnitude of the enforced shift. For each transformation, a fixed target interval for the annual anomaly value is defined, and the applied displacement is randomly sampled from the corresponding interval.

Transformations for Yorkshire Water Leak Detection

For the Yorkshire Water Leak Detection dataset, transformations follow the approach proposed by[3], where anomaly-relevant views are generated by applying multiplicative scaling factors to the original time series. This design is motivated by the physical nature of leak events, which typically manifest as sustained increases in water flow. In addition to the identity transformation, three transformations with increasing scaling magnitudes are defined to simulate leak scenarios of progressively increasing severity. For each transformation, the scaling factor is randomly sampled from a predefined interval corresponding to low, medium, or high magnitude.

Appendix BImplementation and experimental details

Here we describe the implementation choices and training protocols used for ProtoX-AD and all baseline methods.

Implementation of Methods

All shallow baseline methods are implemented using the scikit-learn library[25]with default hyperparameters.

We adopt the TS2Vec encoder[33]as backbone architecture for all deep learning methods. For all methods, decoder architectures are symmetric to the encoder. In ProtoX-AD, the encoder outputs the mean and variance parameters required by the variational formulation.

Training details and hyperparameters

Models are trained for up to 1,000 epochs using early stopping, with aReduceLROnPlateauscheduler that reduces the initial learning rate of10−310^{-3}when the training loss plateaus (patience of 10 epochs). Training is stopped when the learning rate falls below10−610^{-6}.

We use a batch size of 4 across all datasets. Additionally, for all self-supervised methods, each mini-batch is augmented using 5 stochastic repetitions of the transformation module during training, which can be interpreted as data augmentation in the transformation space and increases intra-class diversity.

Finally, ProtoX-AD and KMEx introduce a hyperparameter controlling the number of prototypes per class, which is fixed to 3.

Appendix CAnalysis of Intra-Class Prototype Diversity

In this appendix, we provide additional qualitative results on the GISTEMP problem of the GTA dataset. While the main paper focuses on representative prototypes and explanations, these figures illustrate how multiple prototypes within the same transformation-induced class capture different variants of the corresponding concept.

Figure8presents multiple learned prototypes associated with the identity, cold-light, warm-light, and warm-heavy transformation-induced classes, respectively. These visualizations illustrate how the learned prototypes capture different variations within the same transformation-induced concept across methods and transformation designs.

Refer to caption

Figure 8:Learned prototypes for the GISTEMP dataset. Columns represent transformation-induced classes. Rows correspond to ProtoX-AD with manually designed transformations (MT), KMEx (also based on manually defined transformations), and ProtoX-AD with neural transformations (NT), respectively. Colors denote different classes, with blue indicating the identity (normal) class.##### Manual Transformations

ProtoX-AD learns multiple prototypes that remain semantically coherent within each transformation-induced class while still capturing meaningful intra-class variability. In particular, the variability among ProtoX-AD prototypes within the same class is mainly reflected through changes in the amplitude of the series, directly corresponding to different degrees of temperature deviation within the same regime. Therefore, the learned prototypes vary primarily along the semantically relevant dimension of the problem while preserving the overall structure of the corresponding transformation-induced concept. In contrast, KMEx prototypes exhibit stronger variability in the shape of the series itself, reflecting a greater dependence on instance-specific variations and noise inherited from individual training samples rather than on coherent class-level concepts.

Neural Transformations

In contrast to manually designed transformations, the diversity observed with neural transformations is minimal, as the learned augmented views tend to converge to more similar patterns, resulting in nearly redundant prototypes within each class. Consequently, the learned prototypes do not exhibit a clear correspondence with semantically meaningful temperature regimes, and any apparent similarity with specific anomaly patterns is incidental rather than systematically induced by the learned transformations.

This behavior can be explained by the lack of explicit variability mechanisms in the learned transformations. While the reconstruction objective prevents complete latent collapse and the clustering objective encourages representative latent concepts, neural transformations do not incorporate the stochastic transformation parameters used in manually designed transformations (Table7), which naturally generate diverse augmented views. As a result, the variability among learned prototypes remains limited. These observations suggest that introducing explicit variability mechanisms into neural transformations could be an interesting direction for future work.

相似文章

桥接分类与重建：协同时间序列异常检测

arXiv cs.LG

本文提出CoAD，一种新颖的框架，统一了异常暴露（分类）和掩码自编码器（重建）两种范式用于时间序列异常检测，解决了它们各自的局限性。大量实验表明，CoAD在轻量快速的同时，显著优于现有最先进方法。

TPA-AD：一种用于轴承时间序列异常检测的两阶段伪异常引导方法

arXiv cs.LG

TPA-AD 是一种用于轴承时间序列异常检测的两阶段伪异常引导方法，利用重建模型和对比学习在正常边界附近生成伪异常窗口，再通过 KNN 对异常进行评分——训练过程中无需真实异常样本。该方法在轴承故障和退化数据集上进行了评估，包括高速列车轴箱轴承数据。

小巧但可信：高效视觉语言推理用于时间序列异常检测

Hugging Face Daily Papers

本文提出 VisAnomReasoner，一个参数高效的视觉语言模型，在带自然语言解释的新基准 VisAnomBench 上微调，在时间序列异常检测中精度和 F1 提升超过 21 个百分点，并展现出强大的跨基准泛化能力。

@vintcessun: 原来agent安全可以不止盯工具调用，还能实时读它的推理过程。Adrian在agent执行动作前，既看行为日志又把reasoning chain过一遍，两个维度交叉检测。效果？DeepMind论文说联合分析比纯行为检查准确率提升35%。它…

X AI KOLs Timeline

Adrian 是一个开源 AI 代理运行时安全监控引擎，通过联合分析代理的行为日志和推理链进行异常检测，比纯行为检查准确率提升 35%，支持 LangChain 两行 SDK 接入。

像专家一样检测时间序列异常：一种具有专用分析器的多智能体 LLM 框架

arXiv cs.AI

本文介绍了 SAGE，这是一种用于时间序列异常检测的多智能体 LLM 框架，它利用专用分析器来提高可解释性和可靠性。该框架在三个基准测试中表现出优于基线模型的性能，并通过结构化证据整合增强了诊断报告的质量。

ProtoX-AD: Self-Explainable Time Series Anomaly Detection and Characterization

Abstract

keywords:

1Introduction

2Related Work on SSC-TSAD

3Method

3.1Model Architecture

Feature extraction module

Dual reconstruction module

Prototype module

Classification module

3.2Training ProtoX-AD

Classification through prototypes

Dual reconstruction training objective

Meaningful prototype learning

3.3Anomaly Detection with ProtoX-AD

3.4Explanation through Prototypes

3.5ProtoX-AD is a SEM

4Experimental Setup

4.1Datasets and problem definition

UMD Dataset

Global Temperature Anomalies Dataset (GTA)

Yorkshire Water Leak Detection Dataset

4.2Baselines

Shallow methods

Black-box self-supervised method

Explainable self-supervised method

4.3Transformation design for self-supervised anomaly detection

Manually defined transformations

Learnable neural transformations

4.4Evaluation protocol

5Experimental Results

5.1Evaluation of anomaly detection performance

Manual Transformations

Neural Transformations

5.2Evaluation of explainability

5.2.1Qualitative evaluation

Neural Transformations

5.2.2Quantitative evaluation

Manual Transformations

Neural Transformations

6Discussion

6.1Detection Performance of Explainable Models

Explainability does not inherently degrade anomaly detection performance

ProtoX-AD exhibits more consistent detection performance than KMEx

6.2Comparison of Explainable Methods through Prototype-Based Explanations

ProtoX-AD learns smoother and more representative class-level prototypes

ProtoX-AD enforces consistent latent–input alignment

6.3On the importance of Transformation Design: Manual vs. Neural Transformations

The effectiveness of transformation design depends on its alignment with the anomaly structure

Domain knowledge in transformation design is essential for anomaly characterization

7Conclusion

Acknowledgments

References

Appendix AManual Transformations per dataset

Transformations for UMD

Transformations for Global Temperature Anomalies

Transformations for Yorkshire Water Leak Detection

Appendix BImplementation and experimental details

Implementation of Methods

Training details and hyperparameters

Appendix CAnalysis of Intra-Class Prototype Diversity

Neural Transformations

相似文章

桥接分类与重建：协同时间序列异常检测

TPA-AD：一种用于轴承时间序列异常检测的两阶段伪异常引导方法

小巧但可信：高效视觉语言推理用于时间序列异常检测

@vintcessun: 原来agent安全可以不止盯工具调用，还能实时读它的推理过程。Adrian在agent执行动作前，既看行为日志又把reasoning chain过一遍，两个维度交叉检测。效果？DeepMind论文说联合分析比纯行为检查准确率提升35%。它…

像专家一样检测时间序列异常：一种具有专用分析器的多智能体 LLM 框架

提交意见反馈