ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models
Summary
ADAPTOOD is a novel framework that uses data uncertainty to quantify distribution shift severity and guide fine-tuning of ECG time series models for out-of-distribution settings. It combines uncertainty estimation with low-rank model updates and adaptive hyperparameter optimization, achieving up to 7% higher accuracy and 12.9% higher precision than existing OOD adaptation methods.
View Cached Full Text
Cached at: 06/05/26, 02:21 AM
# Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models
Source: [https://arxiv.org/html/2606.04164](https://arxiv.org/html/2606.04164)
Sotirios Vavaroutas1, Yu Yvonne Wu2, Ali Etemad3, Cecilia Mascolo1
###### Abstract
Data samples used for training often differ from those encountered during fine\-tuning and deployment, and while ML models show promise, their performance remains limited when only small annotated datasets are available\. Performance often degrades under distribution shifts caused by diverse sensors, populations, and application settings\. Although pre\-training helps, models frequently encounter out\-of\-distribution \(OOD\) data in real\-world settings, leading to reduced robustness\. Existing adaptation methods usually assume fixed distribution shifts and struggle when multiple types or severities occur\. In particular, they overlook shiftseverity, for example treating adaptation to a large familiar dataset the same as adaptation to a small dataset with a new task, which limits generalisation\. To address this, we propose ADAPTOOD, a novel framework that leveragesdata uncertaintyto quantify distribution shift severity and guide fine\-tuning for time series\. This uncertainty measures how strongly samples from the target deployment distribution deviate from the pre\-training distribution, providing a direct signal of OOD severity\. Our framework combines this uncertainty with low\-rank model updates and adaptive hyperparameter optimisation to improve adaptation\. We show that ADAPTOOD achieves up to 7% higher accuracy and 12\.9% higher precision than existing methods in OOD tasks, maintaining strong performance as distribution shift severity increases\.
## IIntroduction
With the increasing digitisation of real‑world systems, ML models trained on large‑scale sensor and signal data have shown strong potential for identifying complex patterns and irregularities\[[24](https://arxiv.org/html/2606.04164#bib.bib1)\]\. However, deploying such models outside controlled research settings remains difficult, largely due to the limited availability of high‑quality, labelled datasets required for robust generalisation\[[18](https://arxiv.org/html/2606.04164#bib.bib2)\]\. Producing reliable annotations for time series data demands specialised expertise and is often time‑consuming\[[53](https://arxiv.org/html/2606.04164#bib.bib3)\]\. This often leads to small datasets that might fail to capture the variability of real\-world settings\. While models trained from scratch on small datasets may perform well within a specific context, they tend to overfit and not generalise to new data due to the variety of settings in which data samples are collected\.
Fine\-tuning pre\-trained models has shown promise in these cases, as it allows models to leverage general representations learned from large datasets before adapting to specific tasks with limited data, yet this strategy often struggles under out\-of\-distribution \(OOD\) conditions, where test data diverge from the training distribution due to population heterogeneity, differing sensor types, and variations in collection protocols\[[59](https://arxiv.org/html/2606.04164#bib.bib4),[42](https://arxiv.org/html/2606.04164#bib.bib5)\]\. These variations can result in significant distribution shifts, complicating the development of robust and generalisable models for biosignal interpretation\[[56](https://arxiv.org/html/2606.04164#bib.bib6)\]\.
While transfer learning and domain adaptation methods have been proposed to mitigate such distributional differences\[[55](https://arxiv.org/html/2606.04164#bib.bib7)\], they all face common challenges, including modality mismatch, limited labelled data, and inter\-subject variability\[[46](https://arxiv.org/html/2606.04164#bib.bib8),[57](https://arxiv.org/html/2606.04164#bib.bib9)\], which often lead to degraded performance and poor generalisation\. These methods treat OOD cases under fixed assumptions without considering their granularity orseverity\. They often apply the same adaptation regardless of shift levels or task complexity, so they often do not generalise to the broad spectrum of OOD data that may occur during fine\-tuning\[[31](https://arxiv.org/html/2606.04164#bib.bib10)\], as most adopt a coarse\-grained binary approach, treating data as either in\-domain or out\-of\-domain\. However, in practice, multiple sources of shifts with varying severities frequently overlap\[[1](https://arxiv.org/html/2606.04164#bib.bib11)\], as for example a model trained on data from young adults might underperform on data from elderly patients recorded from a different device, exhibiting a compound population and sensor shift\.
To address these challenges, we propose ADAPTOOD: an adaptation framework that quantifies distribution shift severity, and uses this information to guide model fine\-tuning on OOD electrocardiograms \(ECGs\)\. To quantify severity, we leverage data uncertainty, which reflects how unfamiliar a target input is with regards to the pre\-training distribution\. The key intuition is that when a model encounters new OOD inputs that are dissimilar to its training distribution, its uncertainty level varies\[[32](https://arxiv.org/html/2606.04164#bib.bib12)\]\. More specifically, a low uncertainty implies that the OOD dataset lies near the training distribution, indicating a mild shift, while a high uncertainty signals a more severe shift\[[41](https://arxiv.org/html/2606.04164#bib.bib13),[51](https://arxiv.org/html/2606.04164#bib.bib14)\]\. We fine\-tune the pre\-trained model based on the guidance of this OOD severity quantification mechanism, so that the system adjusts how aggressively it learns from new input\. To support this, ADAPTOOD also incorporates low\-rank adaptation and adaptive hyperparameter optimisation, improving the fine\-tuning both in effectiveness and in computational efficiency\.
We evaluate ADAPTOOD using datasets that reflect realistic distribution shifts in ECG time series\. We compare against transfer learning, supervised learning, and domain adaptation baselines, and against ablated versions of our approach\. Across distribution shifts, ADAPTOOD consistently outperforms alternatives\. Our key contributions are summarised below:
- •We introduce a novel mechanism that leverages data uncertainty to assess the severity of OOD data shifts in terms of their divergence from the pre\-training data\.
- •We develop an OOD severity\-flexible adaptation approach that uses uncertainty and hyperparameter tuning to achieve better calibration according to the severity of distribution shift, while also incorporating low\-rank adaptation to maintain computational efficiency\.
- •Through comprehensive evaluation in OOD model adaptation, we demonstrate that our method achieves up to 7% higher accuracy and 12\.9% higher precision compared to best\-performing baselines, as well as efficiency\-wise performance gains and consistently strong performance across metrics\.
## IIRelated Work
Transfer Learning\.Transfer learning pre\-trains a model on one task and fine\-tunes it on a related one\[[22](https://arxiv.org/html/2606.04164#bib.bib15)\], leveraging learned representations to improve performance on smaller or less representative datasets\[[26](https://arxiv.org/html/2606.04164#bib.bib16)\]\. In time series analysis, this is particularly appealing due to the strong temporal dependencies shared across cardiac signals\[[18](https://arxiv.org/html/2606.04164#bib.bib2)\], which can be learned during pre\-training and reused across distribution\-shifted signals from heterogeneous sources\. Prior work demonstrates that models trained directly on small health datasets tend to generalise poorly\[[2](https://arxiv.org/html/2606.04164#bib.bib17)\], while model performance remains limited when faced with heterogeneous ECG scenarios\[[4](https://arxiv.org/html/2606.04164#bib.bib18)\], where sensor types, patient populations and signal qualities may change simultaneously\[[33](https://arxiv.org/html/2606.04164#bib.bib19)\]\. Existing methods thus struggle to handle the diverse shift types common in time series data, constraining their generalisability\[[28](https://arxiv.org/html/2606.04164#bib.bib20)\]\.
Representation & Contrastive Learning\.Representation learning trains models to discover the most useful representations of input data by capturing robust embeddings\[[3](https://arxiv.org/html/2606.04164#bib.bib21)\]\. In time series, prior work has explored learning representations robust to latent and dynamically changing distributions\[[36](https://arxiv.org/html/2606.04164#bib.bib22)\]for OOD data\. However, this expects the data to come from a limited set of known conditions, which is restrictive for ECG signals collected in the wild, where distribution shifts can be continuous and unpredictable\[[28](https://arxiv.org/html/2606.04164#bib.bib20)\]\. Additionally, ECG\-specific representation learning methods\[[37](https://arxiv.org/html/2606.04164#bib.bib23)\]aim to learn task\-agnostic embeddings transferable across patient populations and diagnostic tasks, but often rely on pretext tasks that may not fully capture temporal variability, noise, or morphology changes introduced by relevant sensors\[[52](https://arxiv.org/html/2606.04164#bib.bib24)\]\.
In parallel, recent contrastive approaches further incorporate clinical metadata to align ECG embeddings with clinically meaningful differences between subjects\[[47](https://arxiv.org/html/2606.04164#bib.bib25)\]\. While these provide a strong and scalable starting point for downstream analysis, complementary mechanisms are required to support effective fine\-tuning when encountering varying distribution shifts in cardiac data\[[33](https://arxiv.org/html/2606.04164#bib.bib19)\]\.
Biosignal Domain Adaptation\.Domain adaptation seeks to transfer knowledge from a source to a target domain\[[19](https://arxiv.org/html/2606.04164#bib.bib26)\]\. Prior work has explored domain adaptation without access to source data\[[10](https://arxiv.org/html/2606.04164#bib.bib27)\]and adaptation leveraging both temporal and frequency features\[[25](https://arxiv.org/html/2606.04164#bib.bib28)\]\. Yet, in ECG applications, shifts arise from changes in acquisition hardware, sampling rates, or patient demographics often occurring concurrently, while such existing methods assume a fixed shift type and overlook its severity\[[59](https://arxiv.org/html/2606.04164#bib.bib4)\]\.
Hyperparameter Optimisation for Biosignal Model Fine\-Tuning\.Hyperparameter optimisation has been shown to improve generalisation across use cases\[[60](https://arxiv.org/html/2606.04164#bib.bib29)\], with strong results under domain and subpopulation shifts using small OOD validation sets\[[11](https://arxiv.org/html/2606.04164#bib.bib30)\]\. However, its application to fine\-tuning models for biosignals and time series remains underexplored, which is particularly useful in ECG settings where the amount of available data and the nature of the downstream task can vary substantially\[[18](https://arxiv.org/html/2606.04164#bib.bib2)\]\.
Uncertainty Quantification for Distribution Shift\.Uncertainty is imperative in model fine\-tuning, as it often reflects deviations from the original training distribution\. Prior work has shown the importance of uncertainty quantification\[[63](https://arxiv.org/html/2606.04164#bib.bib31)\], its relevance in time series data\[[58](https://arxiv.org/html/2606.04164#bib.bib32)\], and its behaviour under dataset shifts\[[41](https://arxiv.org/html/2606.04164#bib.bib13)\]\. For ECGs, uncertainty can naturally capture fine\-grained temporal and morphological variations induced by sensor noise, patient motion, or physiological differences\[[35](https://arxiv.org/html/2606.04164#bib.bib33)\], thus being helpful as a guiding signal for model adaptation or fine\-tuning in time series settings\.
Overall, existing methods address aspects of distribution shift but fail to account for itsseverityand heterogeneity, which are inherent to time series\. In practice, adaptation must cope with limited labelled data and overlapping shift types, requiring both parameter\-efficient updates and adaptive optimisation strategies\. This motivates ADAPTOOD, which leverages uncertainty as a data\-driven signal to quantify shift severity and guide fine\-tuning, enabling robust adaptation in OOD scenarios\.
## IIIMethods
### III\-AProblem Definition
Distribution shifts challenge model deployment\[[62](https://arxiv.org/html/2606.04164#bib.bib34)\]\. These occur when the joint distribution of inputs and outputs,P\(x,y\)P\(x,y\), encountered during testing \(DtD\_\{t\}\), differs from that seen during training \(DsD\_\{s\}\)\. Common sources include population shifts\[[61](https://arxiv.org/html/2606.04164#bib.bib35)\], where the marginal distributionP\(x\)P\(x\)changes due to variation across diverse patient groups, or sensor shifts\[[48](https://arxiv.org/html/2606.04164#bib.bib36)\], where data is recorded using different devices and leads to variations inP\(x\|y\)P\(x\|y\)\. Additionally, label shifts arise due to changes in the task\-specific label distributionP\(y\)P\(y\), and temporal shifts due to a distribution drift over time, whereP\(x,y\)P\(x,y\)evolves due to changes in population health status or medical practice\[[50](https://arxiv.org/html/2606.04164#bib.bib37)\]\. Domain shifts are also important as they can occur when a model pre\-trained on one signal, like ECG, is applied to another, such as photoplethysmogram \(PPG\)\[[42](https://arxiv.org/html/2606.04164#bib.bib5)\]\. Finally, contextual shifts driven by changes in recording conditions \(e\.g\., rest, exercise, surgery\) alter physiological patterns like the heart rate and blood pressure\[[9](https://arxiv.org/html/2606.04164#bib.bib38)\], thus challenging generalisation\.
Figure 1:Overview of ADAPTOOD\. The system adapts a pre\-trained model to OOD data by first estimating uncertainty between source and target data, and then using this information to guide selective layer unfreezing\. Subsequently, it applies LoRA for efficient fine\-tuning and performs hyperparameter tuning for enhanced optimisation across downstream tasks\.
### III\-BSystem Overview
ADAPTOOD adapts a pre\-trained model to OOD data by first estimating uncertainty between source and target data, and then using this information to guide selective layer unfreezing\. Subsequently, it applies low\-rank adaptation \(LoRA\) for efficient fine\-tuning and performs hyperparameter tuning for enhanced optimisation across downstream tasks, as depicted in Figure[1](https://arxiv.org/html/2606.04164#S3.F1)\. Using a classification model pre\-trained on in\-distribution data, it focuses on the fine\-tuning stage to effectively and efficiently alleviate diverse OOD severity\. Specifically, it includes an uncertainty module to quantify OOD severity levels, a Bayesian hyperparameter tuning module that adapts the model to optimise it to varying tasks, and a low\-rank adaptation module \(LoRA\) to enable parameter\-efficient performance\. Using the estimated uncertainty to guide the smart unfreezing of model layers, combined with LoRA\-based adaptation and Bayesian hyperparameter optimisation, it fine\-tunes the model for robust generalisation on OOD data\. In the following subsections, we describe each module\.
### III\-CUncertainty\-Guided Model Adaptation
Uncertainty estimation is fundamental to ADAPTOOD\. In our framework, uncertainty reflects the divergence between the pre\-training data distributionDsD\_\{s\}and the targeted OOD dataDtD\_\{t\}, by measuring the degree of overlap between them\. Specifically, we captureσ\(Dt∣P\(Ds\)\)\\sigma\(D\_\{t\}\\mid P\(D\_\{s\}\)\), representing the uncertainty of the target data with respect to the pre\-training source distribution\. This measures the degree of overlap \(or divergence\) between the two distributions, thus serving as a proxy for detecting distributional shifts\[[51](https://arxiv.org/html/2606.04164#bib.bib14)\]\. A high uncertainty can indicate a severe OOD shift, while a low uncertainty suggests greater similarity to the source distribution\[[41](https://arxiv.org/html/2606.04164#bib.bib13)\]\.
To quantify the shift betweenDsD\_\{s\}andDtD\_\{t\}, we employ data uncertainty estimation\. We apply Principal Component Analysis \(PCA\) to reduce high\-dimensional ECG signals into a single dominant component, to improve the effective density estimation while retaining the key variance in the data\. We then estimate the uncertaintyσ\(Dt∣P\(Ds\)\)\\sigma\(D\_\{t\}\\mid P\(D\_\{s\}\)\)by calculating distribution\-level divergence\. This is done by leveraging the Mahalanobis \(dm\(x\)d\_\{m\}\(x\)\) and Hellinger \(dh\(x\)d\_\{h\}\(x\)\) distances as estimation metrics, discussed below\. We apply Gaussian Kernel Density Estimation \(KDE\) to both the training and OOD dataset to estimate their respective probability density functions, and compute the uncertainty:
σ\(Dt∣P\(Ds\)\)←\{dh\(Dt,Ds\),dm\(Dt,Ds\)\}\\sigma\(D\_\{t\}\\mid P\(D\_\{s\}\)\)\\leftarrow\\\{d\_\{h\}\(D\_\{t\},D\_\{s\}\),d\_\{m\}\(D\_\{t\},D\_\{s\}\)\\\}\(1\)
Mahalanobis distancesare a common way of measuring the distance between distributions, considering their correlation\[[16](https://arxiv.org/html/2606.04164#bib.bib39)\]\. To calculate, we compute from the pre\-training data the mean vectoryyand the covariance matrixΣ\\Sigmathat represent the centre and spread of the data in the feature space\. Using these, we calculate the Mahalanobis distance to measure how far each new batch is from the centre of the known distribution\[[54](https://arxiv.org/html/2606.04164#bib.bib40)\]\. As such, we use this metric to efficiently quantify OOD severity, where a larger value indicates a higher level of uncertainty, as it reflects the data’s deviation from the known distribution and, thus, the OOD severity\.
Hellinger distancessimilarly calculate the similarity between two probability distributions\[[23](https://arxiv.org/html/2606.04164#bib.bib41)\]\. They have been previously used to quantify uncertainty in stochastic differential equations with non\-Gaussian parameters by minimising the distance between empirical probability densities\[[64](https://arxiv.org/html/2606.04164#bib.bib42)\]\. In our case, they can quantify how much an OOD distribution diverges from the training distribution\. Thus, the greater the distance, the more severe the OOD shift\. For each incoming batch of samples, we compute the Hellinger distance by comparing the probability density of the new samples against the source\.
Choice of Complementary Distance\-Based Metrics\.We use these two metrics for capturing complementary aspects of uncertainty related to OOD severity\. For an ablation study comparing them, refer to Section[V\-B](https://arxiv.org/html/2606.04164#S5.SS2)\. Mahalanobis distances quantify how far OOD batches deviate from the known distribution, incorporating feature correlations and scale\[[16](https://arxiv.org/html/2606.04164#bib.bib39)\]\. Hellinger distances, by contrast, measure divergence between overall probability distributions, capturing global distributional shifts\[[23](https://arxiv.org/html/2606.04164#bib.bib41)\]\. Using both metrics gives a more complete picture of OOD severity than a single metric, and we chose them as they account for correlations between variables, unlike simpler metrics like the Euclidean distance, which only focuses on straight\-line distances between two points in feature space and does not account for the scale of features\[[14](https://arxiv.org/html/2606.04164#bib.bib43)\]\. Additionally, unlike alternatives like MC\-dropout, which estimates uncertainty by running multiple stochastic forward passes with dropout active\[[21](https://arxiv.org/html/2606.04164#bib.bib44)\], these metrics calculate uncertainty before fine\-tuning\. Thus, they operate pre\-inference, allowing for proactive severity estimation\.
For computing our final uncertainty metric, givendh\(Dt,Ds\)d\_\{h\}\(D\_\{t\},D\_\{s\}\)anddm\(Dt,Ds\)d\_\{m\}\(D\_\{t\},D\_\{s\}\)along with their respective maximum valuesdhmaxd\_\{h\}^\{\\max\}anddmmaxd\_\{m\}^\{\\max\}, we calculate the normalised distances as follows:
d~h=dh\(Dt,Ds\)dhmax,d~m=dm\(Dt,Ds\)dmmax\\tilde\{d\}\_\{h\}=\\frac\{d\_\{h\}\(D\_\{t\},D\_\{s\}\)\}\{d\_\{h\}^\{\\max\}\},\\quad\\tilde\{d\}\_\{m\}=\\frac\{d\_\{m\}\(D\_\{t\},D\_\{s\}\)\}\{d\_\{m\}^\{\\max\}\}\(2\)
The combined weighted uncertainty metric is then:
σ\(Dt∣P\(Ds\)\)=Dcombined=w⋅d~h\+\(1−w\)⋅d~m\\sigma\(D\_\{t\}\\mid P\(D\_\{s\}\)\)=D\_\{\\text\{combined\}\}=w\\cdot\\tilde\{d\}\_\{h\}\+\(1\-w\)\\cdot\\tilde\{d\}\_\{m\}\(3\)
Here,w∈\[0,1\]w\\in\[0,1\]is the weight that controls the relative importance assigned to each distance\. In our experiments, we setw=0\.5w=0\.5to equally balance the two measures, reflecting a neutral stance when no prior knowledge favours one form of distributional discrepancy over the other\. However, this can be adjusted to emphasise either distance metric depending on the characteristics of the source and target domains\. Similarly, we standardise bothdhmaxd\_\{h\}^\{\\max\}anddmmaxd\_\{m\}^\{\\max\}to 10, as in the range of datasets we evaluated these values consistently captured the upper bounds of uncertainty without saturation\. These aspects remain configurable, useful if applied to datasets with substantially different statistical or uncertainty properties\. The combined uncertainty scoreDcombinedD\_\{\\text\{combined\}\}is then used to inform the degree of selective fine\-tuning as described in the next section\.
### III\-DSelective Layer Unfreezing
Following the OOD severity estimation, as described above, ADAPTOOD uses this information to enable effective representation and parameter transfer from the pre\-trained to the fine\-tuning encoder\. Existing methods typically fine\-tune either the entire model or a fixed subset without considering how severe the OOD shift is, leading to unnecessary computational cost and reduced performance due to overfitting when the severity is low or underfitting when it is high\. To address this, ADAPTOOD initially freezes all layers of the loaded model and then unfreezes a subset based on the OOD severity\. This ensures adaptation without the risks of full retraining, such as excessive cost or catastrophic forgetting of in\-distribution knowledge\. To determine how many layers to unfreeze, ADAPTOOD uses a combination of linear and exponential strategies, and then unfreezes that number of layers starting from the upper layers closest to the output\.
Linear Calculation\.In the linear approach, the number of layers to unfreeze is proportional to the normalised uncertainty between the source and target domains\. Given a total number of layersLL, this is done as follows:
Llinear=\(1−Dcombined\)⋅LL\_\{\\text\{linear\}\}=\(1\-D\_\{\\text\{combined\}\}\)\\cdot L\(4\)Exponential Calculation\.To allow for more sensitivity to uncertainty, an exponential decay function is applied using a decay factorα\>0\\alpha\>0, which governs how rapidly the unfreezing rate drops with increasing uncertainty\. The number of layers to unfreeze is:
Lexponential=L⋅e−α⋅DcombinedL\_\{\\text\{exponential\}\}=L\\cdot e^\{\-\\alpha\\cdot D\_\{\\text\{combined\}\}\}\(5\)Final Calculation\.The final number of layers that are made trainable by ADAPTOOD is taken as the mathematical mean between the linear and the exponential approach:
Lfinal=Llinear\+Lexponential2L\_\{\\text\{final\}\}=\\frac\{L\_\{\\text\{linear\}\}\+L\_\{\\text\{exponential\}\}\}\{2\}\(6\)
Using this strategy, ADAPTOOD customises fine\-tuning based on OOD severity\. Of note, once the trainable layers are selected, the final layers are modified to match the target output shape, ensuring architectural compatibility with the new inputs and outputs\.
### III\-ELow\-Rank Adaptation
For effectiveness, we add Low\-Rank Adaptation \(LoRA\) support to all the Conv1D layers found in the model, as these constitute the majority of the architecture’s core processing units for capturing local temporal patterns\. This is important in our time series tasks: since Conv1D layers are responsible for learning feature representations across time steps\[[30](https://arxiv.org/html/2606.04164#bib.bib45)\], adapting them directly allows focusing the model’s capacity where it matters the most, while keeping the rest of the architecture intact\.
With LoRA, we also improve parameter efficiency\. Instead of fine\-tuning all parameters, LoRA freezes the pre\-trained weights and introduces lightweight trainable components in the form of low\-rank matrices\[[27](https://arxiv.org/html/2606.04164#bib.bib46)\]\. Across all datasets, LoRA consistently leads to smaller model sizes in terms of memory footprint, as per Section[V\-C](https://arxiv.org/html/2606.04164#S5.SS3), which is beneficial for deployment and reduces the risk of overfitting when working with limited target data\.
### III\-FHyperparameter Optimisation
We further optimise the model through dynamic hyperparameter tuning making it even more tailored to the target task with no need for manual adjustments\. This is necessary as the hyperparameters performing well during pre\-training are not always optimal under OOD\. To this end, we use Bayesian optimisation\[[49](https://arxiv.org/html/2606.04164#bib.bib47)\], a widely\-used approach that iteratively leverages information from prior trials to guide subsequent exploration\[[45](https://arxiv.org/html/2606.04164#bib.bib48)\]\. Unlike less guided strategies such as random search, which focus on multiple local optima, this targets promising regions of the search space, yielding more consistent performance\[[49](https://arxiv.org/html/2606.04164#bib.bib47)\]\. This is valuable in OOD settings as it enables efficient identification of hyperparameter configurations that best accommodate distribution shifts\.
We define a search space over the key dimensions of the optimiser and the learning rate\. The optimiser is selected from a pool including Adam, AdaDelta, AdaGrad, AdaMax, and RMSprop, representing a diverse spectrum\. Each has differing behaviours on sparse gradients and noise sensitivity, which can affect OOD adaptation performanc\. For learning rate selection, we sample from a continuous range using logarithmic sampling to ensure finer\-grained search\. ADAPTOOD explores the search space for 10 trials, representing a pragmatic trade\-off between thoroughness and efficiency\. In our experiments, this was sufficient to observe meaningful improvements while keeping resource use reasonable\. Once the search concludes, we retrieve the best configuration based on validation accuracy\.
### III\-GFine\-Tuning
After incorporating the above adaptation mechanisms into ADAPTOOD, we proceed to the final stage: fine\-tuning the model on the small labelled OOD dataset\. During this phase, only the layers deemed trainable by the uncertainty\-aware selection process are updated, ensuring focused and efficient adaptation\. The model is trained for 20 epochs to balance performance gains with the risk of overfitting\. This completes ADAPTOOD’s model transfer process\.
TABLE I:Distribution shifts across datasets in comparison to the pre\-training dataset
## IVExperimental Setup
### IV\-AModel Architecture
We adopt a 1D convolutional neural network \(CNN\) as our pre\-trained model architecture due to its strong inductive bias for temporal signal processing, computational efficiency, and widespread use in time series analysis, where it can capture local patterns while remaining scalable to long sequences\. The architecture starts with an input layer accepting vectors in the shape of the data, followed by several stacked Conv1D layers with increasing filters \(32, 64, 128, 256, 512\) and a kernel size of 5, each using ReLU activation and the same padding\. Subsequently, MaxPooling1D layers reduce the spatial dimension after each convolution, and Dropout layers are inserted after higher\-capacity layers to prevent overfitting\. After the final convolution, the output is flattened and passed through two fully connected dense layers with 64 and 32 neurons respectively, both using the ReLU activation function, followed by a final dense output layer with a single neuron and a sigmoid activation function for classification\. The model is then compiled using the settings chosen dynamically by the hyperaparameter tuner of Section[III\-F](https://arxiv.org/html/2606.04164#S3.SS6)\.
### IV\-BDatasets
We focus on ECG datasets, as they provide structured temporal signals whose dynamic and subtle variations make them ideal for evaluating model adaptation\. To keep the task realistic for effective adaptation, we fine\-tune on only a small subset of each dataset\. We pre\-train a CNN model on the PhysioNet Computing in Cardiology \(CinC\) 2017 dataset\[[12](https://arxiv.org/html/2606.04164#bib.bib49)\], with the version used\[[13](https://arxiv.org/html/2606.04164#bib.bib50)\]providing a clinically relevant task of classifying atrial fibrillation versus non\-atrial fibrillation\. This was selected for pre\-training due to its well\-defined diagnostic labels and task specificity, which provide a strong inductive prior for downstream cardiac classification tasks\. We then evaluate adaptation across multiple downstream datasets introducing population, sensor, temporal, modality, label, and dimensionality shifts, including the PhysioNet MIT\-BIH Arrhythmia Database\[[39](https://arxiv.org/html/2606.04164#bib.bib51)\], the PhysioNet PTB\-DB Database\[[5](https://arxiv.org/html/2606.04164#bib.bib53)\], the MIMICPERform ECG and PPG datasets\[[38](https://arxiv.org/html/2606.04164#bib.bib55),[7](https://arxiv.org/html/2606.04164#bib.bib56)\], and the CODEtest Dataset\[[44](https://arxiv.org/html/2606.04164#bib.bib58)\]\. Across these datasets, tasks include distinguishing healthy from non\-healthy samples and adult from neonate samples, while simulating data\-limited and low\-resource settings using small fine\-tuning subsets\. The exact type of distribution shift present in each case relative to the pre\-training data is noted below, and summarised in Table[I](https://arxiv.org/html/2606.04164#S3.T1)\.
PhysioNet Computing in Cardiology \(CinC\) 2017\.This dataset contains 8,528 single\-lead ECG recordings\[[12](https://arxiv.org/html/2606.04164#bib.bib49)\]and the version used\[[13](https://arxiv.org/html/2606.04164#bib.bib50)\]is sampled at 125Hz\. We use it to pre\-train a CNN model, which serves as the foundation for subsequent experiments\. The pre\-training task involves classifying the signals into atrial fibrillation and non\-atrial fibrillation\. As a clinically relevant dataset, this was selected for pre\-training due to its well\-defined diagnostic labels and task specificity, which provide a strong inductive prior for downstream cardiac classification tasks\.
PhysioNet MIT\-BIH Arrhythmia Database\.This is derived from MIT\-BIH’s arrhythmia database\[[39](https://arxiv.org/html/2606.04164#bib.bib51),[40](https://arxiv.org/html/2606.04164#bib.bib52)\], and contains 109 446 ECGs sampled at 125 Hz and segmented into 188 time steps\. We use a single\-lead version\[[20](https://arxiv.org/html/2606.04164#bib.bib60)\]to better focus on features relevant to real\-world wearable sensors, and randomly select 1,000 samples for fine\-tuning to simulate a data\-limited setting\. This tests ADAPTOOD under constrained data, with our task distinguishing healthy from non\-healthy cases\. For evaluation, we use the full test set of 21,892 samples\. Compared to the pre\-training data, this introduces sensor shifts due to different wearable acquisition devices, population shifts due to different patient cohorts, and temporal shifts from changes in sampling contexts\.
PhysioNet PTB\-DB Database\.This dataset is extracted from the PTB diagnostic database\[[5](https://arxiv.org/html/2606.04164#bib.bib53),[6](https://arxiv.org/html/2606.04164#bib.bib54)\]and the version used\[[20](https://arxiv.org/html/2606.04164#bib.bib60)\]includes 14,552 single\-lead ECGs, sampled at 125Hz\. We retain only 1,000 randomly\-selected samples so that the task is realistic, distinguishing healthy from non\-healthy samples\. This introduces population, sensor, and temporal shifts in signal dynamics\.
MIMICPERform ECG Dataset\.This dataset is extracted from the MIMIC\-III waveform database\[[38](https://arxiv.org/html/2606.04164#bib.bib55)\], and consists the MIMICPERform training and the MIMICPERform testing set\[[7](https://arxiv.org/html/2606.04164#bib.bib56),[8](https://arxiv.org/html/2606.04164#bib.bib57)\]\. The task is to distinguish adult from neonate samples, and we further limit the training set to 100 samples to showcase low\-resource settings where large\-scale training is not feasible\. This dataset presents population \(age and condition differences\), label, sensor, and temporal \(clinical context\) shifts\.
MIMICPERform PPG Dataset\.This contains PPG recordings from the same patients and timeframes as the MIMICPERform ECG set described above\. While it does not include ECG signals, we use it to test ADAPTOOD’s ability to adapt across mobile and wearable biosignal modalities\. This introduces a modality shift, since the ECG pre\-trained model is now applied to PPG data, which have different physiological properties as they measure blood volume changes at the skin surface using light rather than electrical cardiac activity\. This is the most challenging shift, and although ECG models are not expected to perform well on such fundamentally different signals, this setup provides insights into the limits of generalisation across biosignals\.
CODEtest Dataset\.This dataset is the test version of the Clinical Outcomes in Digital Electrocardiology \(CODE\) study\[[44](https://arxiv.org/html/2606.04164#bib.bib58),[43](https://arxiv.org/html/2606.04164#bib.bib59)\]and includes 827 ECG recordings, each consisting of 12 leads sampled at 400Hz, with 4,096 data points per lead\. We classify the samples into healthy and non\-healthy groups\. The recordings in this dataset have been annotated by cardiologists, medical students and others, and we use the gold standard annotation provided in the original source\[[43](https://arxiv.org/html/2606.04164#bib.bib59)\]\. Further to population, sensor, and temporal shifts, this dataset also introduces a dimensionality shift by increasing the number of input channels\. This is in contrast to previous cases, making it valuable for assessing ADAPTOOD\.
TABLE II:Evaluation results from OOD classification tasks on baselines and our ADAPTOOD proposed method
### IV\-CBaselines & Ablations
To evaluate our approach, we establish key alternatives spanning conventional transfer learning, supervised learning, domain adaptation, and ablation settings\. We compare against a transfer learning setup with frozen feature extractors, a supervised learning baseline using the same deep convolutional architecture as ADAPTOOD, a feature\-based domain adaptation using the widely\-used PRED strategy\[[15](https://arxiv.org/html/2606.04164#bib.bib61)\], and an instance\-based domain adaptation via nearest neighbours weighting\[[34](https://arxiv.org/html/2606.04164#bib.bib62)\]\. In addition, we conduct ablation studies by using only the Hellinger distance, by using only the Mahalanobis distance, by removing hyperparameter tuning, and by disabling LoRA, in order to isolate the contribution of each component of ADAPTOOD\. Implementation details for all baselines and ablations are provided below\.
Transfer Learning\.This baseline reflects a conventional transfer learning setup and operates by fine\-tuning a pre\-trained model to extract features from the input data\. A subset of the model’s earlier layers serve as a feature extractor, with their weights frozen to preserve learned representations\. The remaining layers are trainable, and the model is compiled using the original optimiser and loss function\.
Supervised Learning\.This baseline works in a supervised fashion using a deep convolutional neural network with the same model layers as ADAPTOOD, described in Section[IV\-A](https://arxiv.org/html/2606.04164#S4.SS1)\. It is compiled using a cross\-entropy loss and the Adam optimiser\[[29](https://arxiv.org/html/2606.04164#bib.bib63)\]\. Across all experiments, this baseline has access to the same total amount of samples as the alternatives, representing a fair comparison of how the model would perform when there is a only a niche dataset annotated\.
Feature\-Based Domain Adaptation\.This baseline illustrates a feature\-based domain adaptation approach, which builds a shared feature representation to correct the difference between the source and target distributions\. The task is then learned in this encoded space\. For this baseline we use the ADAPT toolbox\[[17](https://arxiv.org/html/2606.04164#bib.bib64)\], and specifically its PRED strategy\[[15](https://arxiv.org/html/2606.04164#bib.bib61)\]that is widely\-used and frequently cited\. This first trains a model on the source domain and uses its predictions on the target data as additional input features\. A second model is then trained on the labelled target data, augmented with these features\. For a fair comparison, the source model uses the same architecture as ADAPTOOD\.
Instance\-Based Domain Adaptation\.Further to the feature\-based domain adaptation baseline seen above, we also use an instance\-based domain adaptation alternative\. Instead of researching common features, this approach focuses on reweighting training data in order to correct the difference between the source and target distributions\. To implement this, we rely on the widely\-used nearest neighbours weighting strategy, which reweights the source instances according to their number of neighbours in the target dataset\[[34](https://arxiv.org/html/2606.04164#bib.bib62)\]\. During training, this reweighting involves multiplying the loss of each training instance by a positive weight\. For a fair comparison, the estimator used to learn the task uses the same architecture as ADAPTOOD’s model\.
Ablation using only the Hellinger Distance\.To enhance our comprehension of which metric is more effective for estimating uncertainty through the calculation of distribution\-level divergence, we conduct an ablation study employing solely the Hellinger distance\. All other elements of ADAPTOOD are retained, thus ensuring that any observed performance improvements can be attributed exclusively to the Hellinger distance\.
Ablation using only the Mahalanobis Distance\.Likewise, a separate ablation study is conducted using exclusively the Mahalanobis distance\. Although our selected metrics are complementary, the sole use of the Mahalanobis distance highlights its distinct advantages over the Hellinger distance\. All other elements of ADAPTOOD are retained, thus ensuring that any observed performance improvements can be attributed exclusively to the Mahalanobis distance metric\.
Ablation without Hyperparameter Tuning\.To understand the impact of ADAPTOOD’s additional components, this ablation removes its hyperparameter tuning\. All other aspects remain unchanged, including the OOD adaptation mechanism and the LoRA module\. This setup isolates the performance gains due to improved hyperparameter optimisation\. To this end, the relevant hyperparameters are fixed and set to use the Adam optimiser and its default learning rate of1e−31\\mathrm\{e\}\{\-3\}\.
Ablation without LoRA\.To assess the effect of low\-rank adaptation, we run a set of experiments by disabling it\. These examine how it affects the adapted model in terms of size, memory footprint, and parameter count\. All other aspects of the ADAPTOOD framework remain unchanged, including the OOD adaptation mechanism and the hyperparameter tuning module\.
## VResults
TABLE III:Ablation study results, comparing variants of our approach with the full method### V\-AKey Findings
Under population, sensor, temporal and other shifts \(discussed in Section[III\-A](https://arxiv.org/html/2606.04164#S3.SS1)\) which occur for the datasets examined, ADAPTOOD achieves strong improvements by guiding adaptation based on OOD severity, indicating that explicitly accounting for OOD levels leads to stronger and more balanced generalisation\.
In the MIT\-BIH data, these shifts cause transfer learning to perform very closely to supervised learning in accuracy \(0\.910 vs 0\.906\) and precision \(0\.891 vs 0\.914\)\. Yet, our OOD severity\-based mechanism improves performance across metrics, including in accuracy \(0\.953\) and precision \(0\.930\)\. While the sensitivity of these baselines remains near chance level \(0\.595 and 0\.499\), ADAPTOOD achieves 0\.820 by tailoring its adaptation to OOD severity, leading to more generalisable classification\. The feature\-based domain adaptation baseline shows improvements, but ADAPTOOD still outperforms it in all cases, including in accuracy \(0\.953 vs 0\.940\), precision \(0\.930 vs 0\.921\), and recall \(0\.900 vs 0\.861\)\. Specificity is inflated for supervised learning compared to ADAPTOOD \(0\.991 vs 0\.980\), but this results from the baseline’s tendency to underpredict the positive class, leading to unbalanced performance\.
Similar results are observed in the PTB\-DB data, where transfer learning and supervised learning perform nearly identically\. The domain adaptation cases, which are more advanced by design, perform better, but ADAPTOOD still outperforms all\. For instance, in accuracy \(0\.922\) it achieves improvements of 11\.9% over the transfer learning \(0\.803\), 10\.4% over the supervised learning \(0\.818\), 4\.7% over the feature\-based domain adaptation \(0\.875\), and 4\.9% over the instance\-based domain adaptation baselines\. A similar trend is recorded in the recall \(0\.899 vs 0\.797 vs 0\.728 vs 0\.836 vs 0\.835\), while in precision it achieves a 12\.9% increase compared to transfer learning \(0\.901 vs 0\.772\)\. These results show that ADAPTOOD’s largest performance gains occur under more severe distributional shifts, validating the design of the severity\-aware adaptation mechanism\.
When, in addition to the above shifts, there is also a task and label shift, the OOD scenario becomes even more severe\. This is the case for the ECG MIMICERformTT dataset, but ADAPTOOD continues to demonstrate superior generalisation\. It achieves an accuracy of 0\.942, significantly higher than both the transfer learning \(0\.882\) and domain adaptation \(0\.867 and 0\.842\) baselines\. This pattern extends to the F1 score, where ADAPTOOD reaches 0\.942 compared to 0\.881, 0\.866, and 0\.841 for the respective baselines, corresponding to an F1 improvement of 10\.1%\. This demonstrates that the OOD severity\-based mechanism maintains balanced improvements across accuracy and F1 score, even under compound distributional shifts\.
To further test ADAPTOOD, we examine a different modality, specifically the PPG recordings of the MIMICPERform PPG dataset\. This introduces a modality shift in addition to the shifts considered in previous scenarios, representing an edge case\. In this case, transfer learning brings a marginal improvement of only 1% in accuracy over the supervised baseline \(0\.917 vs 0\.905\), while the instance\-based domain adaptation offers a better solution that results in 4% higher accuracy \(0\.945 vs 0\.905\)\. Yet, ADAPTOOD outperforms all by reaching 0\.975 in accuracy, while similar results are recorded in other metrics too, like the precision and the recall\. Although fine\-tuning is rarely the go\-to solution when adapting models across modalities and ADAPTOOD is not designed for such cases, this experiment demonstrates its ability to handle extreme OOD conditions, achieving up to 7% higher accuracy than the strongest baseline\.
In the CODEtest dataset, which introduces a dimensionality shift by increasing the number of input channels, the results are also interesting\. That’s because it was originally recorded using 12\-lead configurations of ECG biosignals, rather than single\-lead like the previous wearable datasets\. The transfer learning, supervised learning, feature\-based domain adaptation and instance\-based domain adaptation baselines reach values of 0\.873, 0\.861, 0\.894, and 0\.882 in accuracy, respectively, but ADAPTOOD outperforms all with a value of 0\.922\. Our solution’s improvements also extend to other metrics, including in precision \(0\.875 vs less than 0\.825 for the baselines\) and F1 score \(0\.836 vs less than 0\.789 for the baselines\)\. Overall, ADAPTOOD consistently delivers gains in accuracy across diverse OOD scenarios, reinforcing that modelling OOD severity is a key driver of robust generalisation\.
### V\-BAblation Studies
While ADAPTOOD’s uncertainty\-guided transfer learning system serves as its core engine, both the choice of uncertainty metrics and the choice of its supporting components is imperative towards its success\.
Uncertainty Metric Comparisons\.To understand which metric of Section[III\-C](https://arxiv.org/html/2606.04164#S3.SS3)is the most effective in capturing OOD\-related uncertainty, we conduct an ablation study with distinct experiments: one utilising the Hellinger distance and another using the Mahalanobis distance\. The results can be found in Table[III](https://arxiv.org/html/2606.04164#S5.T3)\. While the Mahalanobis distance outperforms the Hellinger version, our approach that integrates both performs consistently better\. For instance, in the PTB\-DB data it reaches an accuracy of 0\.922, compared to just 0\.825 when using the Hellinger distance or 0\.875 when using the Mahalanobis distance\. Similarly, in the ECG MIMICPERformTT data, ADAPTOOD reaches a recall value of 0\.942, while the Hellinger ablation has a lower value of 0\.933\. While the differences are not significant in the case of switching the metrics, combining them allows for more consistent results\.
Supporting Mechanism Comparisons\.Further to the uncertainty\-based ablations, we also conduct a study to understand the importance of its supporting components\. We observe that both the hyperparameter \(HP\) tuning and the LoRA integration contribute to its performance gains\. In ECG PTB\-DB data, removing HP tuning lowers ADAPTOOD’s accuracy by 4% \(0\.922 vs 0\.882\), while this is also confirmed through other metrics like the precision \(0\.901 vs 0\.869\) and the recall \(0\.899 vs 0\.822\) that also get reduced\. A similar drop is noted in ECG MIMICPERformTT, where the ablation without HP tuning shows a 3% drop in accuracy \(0\.942 vs 0\.913\)\. However, the HP tuning module is not as significant in some other cases\. For instance, its removal in the MIT\-BIH data leads to less than 1% drop in accuracy \(0\.953 vs 0\.945\)\. This shows that while selective layer unfreezing drives adaptation, HP tuning is also key when the pre\-trained model is less suited to the OOD levels\.
The significance of LoRA shows resembling behaviour: some of the ablations that do not use it suffer from negligible performance degradation, while for others it proves crucial\. In the ECG MIMICPERformTT data, its removal decreases the accuracy and F1 score by 2\.9% \(both 0\.942 vs 0\.913\) and the specificity by as much as 6% \(0\.903 vs 0\.843\)\. In contrast, in the ECG CODEtest data it results in just a 0\.6% decrease in accuracy \(0\.922 vs 0\.916\), while surprisingly improving precision \(0\.875 vs 0\.927\) compared to ADAPTOOD\. Yet, these improvements are likely due to noise or dataset\-specific variance\. Therefore, using the full ADAPTOOD model, is recommended for consistent and robust adaptation under varying OOD severity scenarios\.
### V\-CModel Efficiency
ADAPTOOD is computationally efficient, as it leads to a substantial reduction in both total parameter count and memory footprint compared to the ablation without LoRA across all evaluated datasets\.
Its computational efficiency is achieved through its uncertainty\-based adaptation approach and the use of low\-rank model updates that enable parameter\-efficient adaptation\. For instance, in the MIT\-BIH dataset, ADAPTOOD’s final model requires only 2,256,225 total parameters \(8\.61 MB\) versus 7,647,781 parameters \(29\.17 MB\) for the version without LoRA\. Similar trends are observed for the PTB\-DB \(2,256,225 vs\. 7,647,781 parameters; 8\.61 MB vs\. 29\.17 MB\), ECG MIMICPERformTT \(21,392,737 vs\. 26,784,293 parameters; 81\.61 MB vs\. 102\.17 MB\), PPG MIMICPERformTT \(21,392,737 vs\. 26,784,293 parameters; 81\.61 MB vs\. 102\.17 MB\), and CODEtest \(14,806,369 vs\. 20,197,925 parameters; 56\.48 MB vs\. 77\.05 MB\) data\. Moreover, ADAPTOOD updates only the minimal set of parameters necessary, rather than fine\-tuning the full model\. Its design also emphasises adaptive updates: uncertainty\-aware adjustments mean that minimal updates are made for high\-confidence cases, while more extensive fine\-tuning is performed for low\-confidence cases\. Thus, it is a lightweight yet powerful framework that leads to computational cost savings too\.
## VIConclusion
ADAPTOOD introduces dynamic and uncertainty\-aware fine\-tuning of pre\-trained models for time series data under out\-of\-distribution conditions\. It directly addresses the complex and varied distribution shifts seen in real\-world ECG biosignals, arising from differences in sensors, populations, domains, labels, and temporal contexts\. By fine\-tuning only when necessary it avoids overfitting and reduces unnecessary computation\. Its adaptive strategy leverages uncertainty to guide selective layer training, applies LoRA to update relevant components, and uses hyperparameter optimisation to adjust settings effectively\.
The ADAPTOOD method outperforms alternatives across a broad range of OOD scenarios, showing consistent gains in robustness, calibration, and generalisation\. Its ablations perform strongly as well, which is important as it shows that it can also be a reliable option in settings with varying scalability, complexity, or real\-time deployment requirements\. A key contributor to this performance is ADAPTOOD’s ability to adapt the degree of fine\-tuning based on the severity of distribution shifts, indicating the benefit of uncertainty\-guided adaptation over static protocols\. These results position ADAPTOOD as a promising and reliable approach for out\-of\-distribution model fine\-tuning cases\.
## ACKNOWLEDGMENT
This work is supported by Arm and by EPSRC grant EP/S023046/1 for the EPSRC Centre for Doctoral Training in Sensor Technologies and Applications\.
## References
- \[1\]J\. Ai and Z\. Ren\(2024\)[Not All Distributional Ahifts are Equal: Fine\-Grained Robust Conformal Inference](https://proceedings.mlr.press/v235/ai24a.html)\.InProceedings of the 41st International Conference on Machine Learning \(ICML\),Vol\.235,pp\. 641–665\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p3.1)\.
- \[2\]A\. Althnian, D\. AlSaeed, H\. Al\-Baity, A\. Samha, A\. B\. Dris, N\. Alzakari, A\. Abou Elwafa, and H\. Kurdi\(2021\)[Impact of Dataset Size on Classification Performance: An Empirical Evaluation in the Medical Domain](https://doi.org/10.3390/app11020796)\.Applied Sciences11\(2\)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p1.1)\.
- \[3\]Y\. Bengio, A\. C\. Courville, and P\. Vincent\(2012\)[Representation Learning: A Review and New Perspectives](https://arxiv.org/abs/1206.5538)\.IEEE Transactions on Pattern Analysis and Machine Intelligence35,pp\. 1798–1828\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p2.1)\.
- \[4\]A\. Bizzego, G\. Gabrieli, M\. J\. Neoh, and G\. Esposito\(2021\)[Improving the Efficacy of Deep Learning Models for Heart Beat detection on Heterogeneous Datasets](https://arxiv.org/abs/2110.13732)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p1.1)\.
- \[5\]R\. Bousseljot, D\. Kreiseler, and A\. Schnabel\(1995\)[Nutzung der EKG\-Signaldatenbank Cardiodat der PTB über das Internet](https://doi.org/10.1515/bmte.1995.40.s1.317)\.De Gruyter Brill40,pp\. 317–318\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p4.1)\.
- \[6\]R\. Bousseljot\(2004\)[PTB Diagnostic ECG Database](https://www.physionet.org/content/ptbdb/)\.PhysioNet\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p4.1)\.
- \[7\]P\. H\. Charlton, K\. Kotzen, E\. Mejía\-Mejía, P\. J\. Aston, K\. Budidha, J\. Mant, C\. Pettit, J\. A\. Behar, and P\. A\. Kyriacou\(2022\)[Detecting Beats in the Photoplethysmogram: Benchmarking Open\-Source Algorithms](https://doi.org/10.1088/1361-6579/ac826d)\.Physiological Measurement43\(8\),pp\. 085007\.External Links:ISSN 1361\-6579Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p5.1)\.
- \[8\]P\. H\. Charlton\(2022\)[MIMICPERform Datasets](https://doi.org/10.5281/ZENODO.6807402)\.Zenodo\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p5.1)\.
- \[9\]M\. Chen, L\. Shen, H\. Fu, Z\. Li, J\. Sun, and C\. Liu\(2024\)[Calibration of Time\-Series Forecasting: Detecting and Adapting Context\-Driven Distribution Shift](https://doi.org/10.1145/3637528.3671926)\.InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining \(KDD\),pp\. 341–352\.Cited by:[§III\-A](https://arxiv.org/html/2606.04164#S3.SS1.p1.7)\.
- \[10\]B\. Chidlovskii, S\. Clinchant, and G\. Csurka\(2016\)[Domain Adaptation in the Absence of Source Domain Data](https://doi.org/10.1145/2939672.2939716)\.InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining \(KDD\),pp\. 451–460\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p4.1)\.
- \[11\]C\. Choi, Y\. Lee, A\. Chen, A\. Zhou, A\. Raghunathan, and C\. Finn\(2024\)[AutoFT: Learning an Objective for Robust Fine\-Tuning](https://arxiv.org/abs/2401.10220)\.NeurIPS Workshop on Distribution Shifts\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p5.1)\.
- \[12\]G\. Clifford, C\. Liu, B\. Moody, L\. Lehman, I\. Silva, Q\. Li, A\. Johnson, and R\. Mark\(2017\)[AF Classification from a Short Single Lead ECG Recording: the Physionet Computing in Cardiology Challenge 2017](https://doi.org/10.22489/CinC.2017.065-469)\.Computing in Cardiology Conference \(CinC\)\.External Links:ISSN 2325\-887XCited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p2.1)\.
- \[13\]G\. Clifford, C\. Liu, B\. Moody, L\. Lehman, I\. Silva, Q\. Li, A\. Johnson, and R\. Mark\(2017\)[Physionet 2017 ECG](https://www.kaggle.com/datasets/luigisaetta/physionet2017ecg)\.Kaggle\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p2.1)\.
- \[14\]P\. Danielsson\(1980\)[Euclidean Distance Mapping](https://doi.org/10.1016/0146-664X(80)90054-4)\.Computer Graphics and Image Processing14\(3\),pp\. 227–248\.External Links:ISSN 0146\-664XCited by:[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p5.1)\.
- \[15\]H\. Daumé III\(2007\)[Frustratingly Easy Domain Adaptation](https://aclanthology.org/P07-1033/)\.InProceedings of the 45th Annual Meeting of the Association of Computational Linguistics,pp\. 256–263\.Cited by:[§IV\-C](https://arxiv.org/html/2606.04164#S4.SS3.p1.1),[§IV\-C](https://arxiv.org/html/2606.04164#S4.SS3.p4.1)\.
- \[16\]R\. De Maesschalck, D\. Jouan\-Rimbaud, and D\.L\. Massart\(2000\)[The Mahalanobis Distance](https://doi.org/10.1016/S0169-7439(99)00047-7)\.Chemometrics and Intelligent Laboratory Systems50\(1\),pp\. 1–18\.External Links:ISSN 0169\-7439Cited by:[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p3.2),[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p5.1)\.
- \[17\]A\. de Mathelin, F\. Deheeger, M\. Mougeot, and N\. Vayatis\(2025\)[Deep Out\-of\-Distribution Uncertainty Quantification via Weight Entropy Maximization](http://jmlr.org/papers/v26/23-1359.html)\.Journal of Machine Learning Research26\(4\)\.Cited by:[§IV\-C](https://arxiv.org/html/2606.04164#S4.SS3.p4.1)\.
- \[18\]C\. Ding, T\. Yao, C\. Wu, and J\. Ni\(2025\)[Advances in Deep Learning for Personalized ECG Diagnostics: A Systematic Review Addressing Inter\-Patient Variability and Generalization Constraints](https://doi.org/10.1016/j.bios.2024.117073)\.Biosensors and Bioelectronics271,pp\. 117073\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p1.1),[§II](https://arxiv.org/html/2606.04164#S2.p1.1),[§II](https://arxiv.org/html/2606.04164#S2.p5.1)\.
- \[19\]A\. Farahani, S\. Voghoei, K\. Rasheed, and H\. R\. Arabnia\(2021\)[A Brief Review of Domain Adaptation](https://doi.org/10.1007/978-3-030-71704-9_65)\.Advances in Data Science and Information Engineering,pp\. 877–894\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p4.1)\.
- \[20\]S\. Fazeli\(2018\)[ECG Heartbeat Categorization Dataset](https://www.kaggle.com/datasets/shayanfazeli/heartbeat)\.Kaggle\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p3.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p4.1)\.
- \[21\]Y\. Gal and Z\. Ghahramani\(2016\)[Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning](https://proceedings.mlr.press/v48/gal16.html)\.InProceedings of The 33rd International Conference on Machine Learning,Vol\.48,pp\. 1050–1059\.Cited by:[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p5.1)\.
- \[22\]M\. Gholizade, H\. Soltanizadeh, M\. Rahmanimanesh, and S\. S\. Sana\(2025\)[A Review of Recent Advances and Strategies in Transfer Learning](https://doi.org/10.1007/s13198-024-02684-2)\.International Journal of System Assurance Engineering and Management16\(3\),pp\. 1123–1162\.External Links:ISSN 0976\-4348Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p1.1)\.
- \[23\]S\. Govindaraj and T\. Tejas\(2022\)[The Hellinger Distance and its Applications to Hypothesis Testing and Model Uncertainty](https://doi.org/10.2139/ssrn.4035007)\.SSRN Electronic Journal\.External Links:ISSN 1556\-5068Cited by:[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p4.1),[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p5.1)\.
- \[24\]U\. Gupta, N\. Paluru, D\. Nankani, K\. Kulkarni, and N\. Awasthi\(2024\)[A Comprehensive Review on Efficient Artificial Intelligence Models for Classification of Abnormal Cardiac Rhythms using Electrocardiograms](https://doi.org/10.1016/j.heliyon.2024.e26787)\.Heliyon10\(5\),pp\. e26787\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p1.1)\.
- \[25\]H\. He, O\. Queen, T\. Koker, C\. Cuevas, T\. Tsiligkaridis, and M\. Zitnik\(2023\)[Domain Adaptation for Time Series under Feature and Label Shifts](https://proceedings.mlr.press/v202/he23b.html)\.InProceedings of the 40th International Conference on Machine Learning \(ICML\),Vol\.202,pp\. 12746–12774\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p4.1)\.
- \[26\]A\. Hosna, E\. Merry, J\. Gyalmo, Z\. Alom, Z\. Aung, and M\. A\. Azim\(2022\)[Transfer Learning: A Friendly Introduction](https://doi.org/10.1186/s40537-022-00652-w)\.Journal of Big Data9\(1\)\.External Links:ISSN 2196\-1115Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p1.1)\.
- \[27\]E\. J\. Hu, Y\. Shen, P\. Wallis, Z\. Allen\-Zhu, Y\. Li, S\. Wang, L\. Wang, and W\. Chen\(2022\)[LoRA: Low\-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)\.International Conference on Learning Representations \(ICLR\)\.Cited by:[§III\-E](https://arxiv.org/html/2606.04164#S3.SS5.p2.1)\.
- \[28\]Z\. Huang, S\. MacLachlan, L\. Yu, L\. F\. Herbozo Contreras, N\. D\. Truong, A\. H\. Ribeiro, and O\. Kavehei\(2024\)[Generalization Challenges in Electrocardiogram Deep Learning: Insights from Dataset Characteristics and Attention Mechanism](http://doi.org/10.1080/14796678.2024.2354082)\.Future Cardiology20\(4\),pp\. 209–220\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p1.1),[§II](https://arxiv.org/html/2606.04164#S2.p2.1)\.
- \[29\]D\. P\. Kingma and J\. Ba\(2015\)[Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)\.International Conference on Learning Representations \(ICLR\)\.Cited by:[§IV\-C](https://arxiv.org/html/2606.04164#S4.SS3.p3.1)\.
- \[30\]S\. Kiranyaz, O\. Avci, O\. Abdeljaber, T\. Ince, M\. Gabbouj, and D\. J\. Inman\(2021\)[1D Convolutional Neural Networks and Applications: A Survey](https://doi.org/10.1016/j.ymssp.2020.107398)\.Mechanical Systems and Signal Processing151,pp\. 107398\.Cited by:[§III\-E](https://arxiv.org/html/2606.04164#S3.SS5.p1.1)\.
- \[31\]W\. M\. Kouw and M\. Loog\(2019\)[An Introduction to Domain Adaptation and Transfer Learning](https://arxiv.org/abs/1812.11806)\.arXiv1812\.11806\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p3.1)\.
- \[32\]B\. Lakshminarayanan, A\. Pritzel, and C\. Blundell\(2017\)[Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles](https://proceedings.neurips.cc/paper_files/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf)\.Advances in Neural Information Processing Systems \(NeurIPS\)30\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p4.1)\.
- \[33\]R\. Li, Y\. Aierken, Y\. Xu, J\. Liu, and Y\. Tang\(2025\)[Research on Cross\-Dataset Cardiac Signal Domain Generalization and Feature Interpretability](http://doi.org/10.1038/s41598-025-33057-9)\.Scientific Reports16\(1\)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p1.1),[§II](https://arxiv.org/html/2606.04164#S2.p3.1)\.
- \[34\]M\. Loog\(2012\)[Nearest Neighbor\-Based Importance Weighting](https://doi.org/10.1109/MLSP.2012.6349714)\.In2012 IEEE International Workshop on Machine Learning for Signal Processing,Cited by:[§IV\-C](https://arxiv.org/html/2606.04164#S4.SS3.p1.1),[§IV\-C](https://arxiv.org/html/2606.04164#S4.SS3.p5.1)\.
- \[35\]L\. J\. L\. López, S\. Elsharief, D\. A\. Jorf, F\. Darwish, C\. Ma, and F\. E\. Shamout\(2025\)[Uncertainty Quantification for Machine Learning in Healthcare: A Survey](https://arxiv.org/abs/2505.02874)\.Conference on Health, Inference, and Learning \(CHIL\)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p6.1)\.
- \[36\]W\. Lu, J\. Wang, X\. Sun, Y\. Chen, and X\. Xie\(2023\)[Out\-of\-Distribution Representation Learning for Time Series Classification](https://arxiv.org/abs/2209.07027)\.International Conference on Learning Representations \(ICLR\)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p2.1)\.
- \[37\]T\. Mehari and N\. Strodthoff\(2022\)[Self\-Supervised Representation Learning from 12\-Lead ECG Data](https://doi.org/10.1016/j.compbiomed.2021.105114)\.Computers in Biology and Medicine141,pp\. 105114\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p2.1)\.
- \[38\]B\. Moody, G\. Moody, M\. Villarroel, G\. Clifford, and I\. Silva\(2020\)[MIMIC\-III Waveform Database](https://doi.org/10.13026/C2607M)\.PhysioNet\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p5.1)\.
- \[39\]G\. B\. Moody and R\. G\. Mark\(2001\)[The Impact of the MIT\-BIH Arrhythmia Database](https://doi.org/10.1109/51.932724)\.IEEE Engineering in Medicine and Biology20\(3\),pp\. 45–50\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p3.1)\.
- \[40\]G\. Moody and R\. Mark\(2005\)[MIT\-BIH Arrhythmia Database](https://www.physionet.org/content/mitdb/)\.PhysioNet\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p3.1)\.
- \[41\]Y\. Ovadia, E\. Fertig, J\. Ren, Z\. Nado, D\. Sculley, S\. Nowozin, J\. Dillon, B\. Lakshminarayanan, and J\. Snoek\(2019\)[Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift](https://proceedings.neurips.cc/paper_files/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf)\.Advances in Neural Information Processing Systems \(NeurIPS\)32\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p4.1),[§II](https://arxiv.org/html/2606.04164#S2.p6.1),[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p1.3)\.
- \[42\]S\. Rajendran, W\. Pan, M\. R\. Sabuncu, Y\. Chen, J\. Zhou, and F\. Wang\(2024\)[Learning Across Diverse Biomedical Data Modalities and Cohorts: Challenges and Opportunities for Innovation](https://doi.org/10.1016/j.patter.2023.100913)\.Patterns5\(2\),pp\. 100913\.External Links:ISSN 2666\-3899Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.04164#S3.SS1.p1.7)\.
- \[43\]A\. H\. Ribeiro, M\. H\. Ribeiro, G\. M\. Paixão, D\. M\. Oliveira, P\. R\. Gomes, J\. A\. Canazart, M\. P\. Ferreira, C\. R\. Andersson, P\. W\. Macfarlane, W\. Meira Jr\., T\. B\. Schön, and A\. L\. P\. Ribeiro\(2020\)[CODE\-test: An Annotated 12\-Lead ECG Dataset](https://doi.org/10.5281/ZENODO.3625006)\.Zenodo\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p7.1)\.
- \[44\]A\. H\. Ribeiro, M\. H\. Ribeiro, G\. M\. M\. Paixão, D\. M\. Oliveira, P\. R\. Gomes, J\. A\. Canazart, M\. P\. S\. Ferreira, C\. R\. Andersson, P\. W\. Macfarlane, W\. Meira Jr\., T\. B\. Schön, and A\. L\. P\. Ribeiro\(2020\)[Automatic Diagnosis of the 12\-Lead ECG Using a Deep Neural Network](https://doi.org/10.1038/s41467-020-15432-4)\.Nature Communications11\(1\),pp\. 1760\.Cited by:[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p1.1),[§IV\-B](https://arxiv.org/html/2606.04164#S4.SS2.p7.1)\.
- \[45\]S\. Roy, R\. Mehera, R\. K\. Pal, and S\. K\. Bandyopadhyay\(2023\)[Hyperparameter Optimization for Deep Neural Network Models: A Comprehensive Study on Methods and Techniques](http://doi.org/10.1007/s11334-023-00540-3)\.Innovations in Systems and Software Engineering\.Cited by:[§III\-F](https://arxiv.org/html/2606.04164#S3.SS6.p1.1)\.
- \[46\]Y\. Roy, H\. Banville, I\. Albuquerque, A\. Gramfort, T\. H\. Falk, and J\. Faubert\(2019\)[Deep Learning\-Based Electroencephalography Analysis: A Systematic Review](https://doi.org/10.1088/1741-2552/ab260c)\.Journal of Neural Engineering16\(5\),pp\. 051001\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p3.1)\.
- \[47\]Y\. Shu, P\. H\. Charlton, F\. Kawsar, J\. Hernesniemi, and M\. Malekzadeh\(2025\)[CLEF: Clinically\-Guided Contrastive Learning for Electrocardiogram Foundation Models](https://arxiv.org/abs/2512.02180)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p3.1)\.
- \[48\]A\. Simons, T\. Doyle, D\. Musson, and J\. Reilly\(2020\)[Impact of Physiological Sensor Variance on Machine Learning Algorithms](https://doi.org/10.1109/SMC42975.2020.9282912)\.IEEE International Conference on Systems, Man, and Cybernetics \(SMC\),pp\. 241–247\.Cited by:[§III\-A](https://arxiv.org/html/2606.04164#S3.SS1.p1.7)\.
- \[49\]J\. Snoek, H\. Larochelle, and R\. P\. Adams\(2012\)[Practical Bayesian Optimization of Machine Learning Algorithms](https://arxiv.org/abs/1206.2944)\.Cited by:[§III\-F](https://arxiv.org/html/2606.04164#S3.SS6.p1.1)\.
- \[50\]D\. Spathis, I\. Perez\-Pozuelo, T\. I\. Gonzales, Y\. Wu, S\. Brage, N\. Wareham, and C\. Mascolo\(2022\)[Longitudinal Cardio\-Respiratory Fitness Prediction through Wearables in Free\-Living Environments](https://doi.org/10.1038/s41746-022-00719-1)\.npj Digital Medicine5\(1\)\.External Links:ISSN 2398\-6352Cited by:[§III\-A](https://arxiv.org/html/2606.04164#S3.SS1.p1.7)\.
- \[51\]E\. Svensson, H\. R\. Friesacher, A\. Arany, L\. Mervin, and O\. Engkvist\(2025\)[Temporal Evaluation of Uncertainty Quantification Under Distribution Shift](https://doi.org/10.1007/978-3-031-72381-0_11)\.AI in Drug Discovery,pp\. 132–148\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p4.1),[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p1.3)\.
- \[52\]P\. Trirat, Y\. Shin, J\. Kang, Y\. Nam, J\. Na, M\. Bae, J\. Kim, B\. Kim, and J\. Lee\(2024\)[Universal Time\-Series Representation Learning: A Survey](https://arxiv.org/abs/2401.03717)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p2.1)\.
- \[53\]S\. Vavaroutas, G\. Rizos, and C\. Mascolo\(2025\)[SALTS: Streamlined Adaptive Learning for Sensors Time Series](https://doi.org/10.1109/EMBC58623.2025.11254027)\.InProceedings of the 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society \(EMBC\),Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p1.1)\.
- \[54\]A\. Venkataramanan, A\. Benbihi, M\. Laviale, and C\. Pradalier\(2023\)[Gaussian Latent Representations for Uncertainty Estimation using Mahalanobis Distance in Deep Classifiers](https://doi.org/10.1109/ICCVW60793.2023.00483)\.IEEE/CVF International Conference on Computer Vision Workshops \(ICCVW\),pp\. 4490–4499\.Cited by:[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p3.2)\.
- \[55\]J\. Wu and J\. He\(2023\)[Trustworthy Transfer Learning: Transferability and Trustworthiness](https://doi.org/10.1145/3580305.3599576)\.InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 5829–5830\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p3.1)\.
- \[56\]T\. Xia, J\. Han, and C\. Mascolo\(2022\)[Benchmarking Uncertainty Quantification on Biosignal Classification Tasks Under Dataset Shift](https://doi.org/10.1007/978-3-031-14771-5_25)\.Springer Multimodal AI in Healthcare,pp\. 347–359\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p2.1)\.
- \[57\]L\. Xu, M\. Xu, Y\. Ke, X\. An, S\. Liu, and D\. Ming\(2020\)[Cross\-Dataset Variability Problem in EEG Decoding with Deep Learning](https://doi.org/10.3389/fnhum.2020.00103)\.Frontiers in Human Neuroscience14\.External Links:ISSN 1662\-5161Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p3.1)\.
- \[58\]Z\. Xu, J\. Zhang, S\. Haroutounian, H\. Liu, Z\. Cao, G\. R\. Messner, H\. B\. Alaverdyan, S\. Ahuja, R\. Koshy, J\. Hanns, M\. Frumkin, T\. L\. Rodebaugh, and C\. Lu\(2025\)[Incorporating Uncertainty in Predictive Models Using Mobile Sensing and Clinical Data: A Case Study on Persistent Post\-surgical Pain](https://doi.org/10.1145/3729488)\.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies \(IMWUT\)9\(2\)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p6.1)\.
- \[59\]J\. Yang, K\. Zhou, Y\. Li, and Z\. Liu\(2024\)[Generalized Out\-of\-Distribution Detection: A Survey](https://doi.org/10.1007/s11263-024-02117-4)\.International Journal of Computer Vision132\(12\),pp\. 5635–5662\.Cited by:[§I](https://arxiv.org/html/2606.04164#S1.p2.1),[§II](https://arxiv.org/html/2606.04164#S2.p4.1)\.
- \[60\]L\. Yang and A\. Shami\(2020\)[On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice](https://doi.org/10.1016/j.neucom.2020.07.061)\.Neurocomputing415,pp\. 295–316\.External Links:ISSN 0925\-2312Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p5.1)\.
- \[61\]Y\. Yang, H\. Zhang, D\. Katabi, and M\. Ghassemi\(2023\)[Change is Hard: A Closer Look at Subpopulation Shift](https://proceedings.mlr.press/v202/yang23s.html)\.InProceedings of the 40th International Conference on Machine Learning \(ICML\),Vol\.202,pp\. 39584–39622\.Cited by:[§III\-A](https://arxiv.org/html/2606.04164#S3.SS1.p1.7)\.
- \[62\]H\. Yao, C\. Choi, B\. Cao, Y\. Lee, P\. W\. W\. Koh, and C\. Finn\(2022\)[Wild\-Time: A Benchmark of In\-the\-Wild Distribution Shift over Time](https://arxiv.org/pdf/2211.14238)\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Vol\.35,pp\. 10309–10324\.Cited by:[§III\-A](https://arxiv.org/html/2606.04164#S3.SS1.p1.7)\.
- \[63\]T\. Zhang, D\. Zhang, G\. Wang, Y\. Li, Y\. Hu, Q\. Sun, and Y\. Chen\(2024\)[RLoc: Towards Robust Indoor Localization by Quantifying Uncertainty](https://doi.org/10.1145/3631437)\.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies \(IMWUT\)7\(4\)\.Cited by:[§II](https://arxiv.org/html/2606.04164#S2.p6.1)\.
- \[64\]Y\. Zheng, F\. Yang, J\. Duan, and J\. Kurths\(2021\)[Quantifying Model Uncertainty for the Observed Non\-Gaussian Data by the Hellinger Distance](https://doi.org/10.1016/j.cnsns.2021.105720)\.Communications in Nonlinear Science and Numerical Simulation96,pp\. 105720\.External Links:ISSN 1007\-5704Cited by:[§III\-C](https://arxiv.org/html/2606.04164#S3.SS3.p4.1)\.Similar Articles
Algometrics: Forecasting Under Algorithmic Feedback
This paper introduces algometrics, a framework for time series forecasting under algorithmic feedback, proving that deployment risk differs from historical risk and is not identifiable from passive data alone. It provides methods for estimating deployment risk using interventions or randomized actions.
Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift
This paper proposes Gen-ROTDA, a robust optimal transport-guided residual domain adaptation framework for predicting bike-sharing demand under temporal domain shift, achieving improved stability and accuracy compared to baselines, especially with noisy target data.
Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection
This paper proposes CoAD, a novel framework that unifies Outlier Exposure (classification) and Masked Autoencoder (reconstruction) paradigms for time series anomaly detection, addressing their respective limitations. Extensive experiments show that CoAD significantly outperforms state-of-the-art methods while being lightweight and fast.
TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models
This paper introduces TSFMAudit, the first method for auditing pretraining data contamination in time series foundation models, using probe adaptation dynamics to detect unusually efficient fine-tuning that indicates prior exposure.
STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification
This paper introduces STDA-Net, a domain adaptation framework for cross-dataset sleep stage classification using 2D spectrograms and adversarial learning. It demonstrates improved accuracy and stability over existing 1D EEG baseline methods on public datasets.