Cross-Modal Contrastive Learning of ECG and Angiography Representations for Severe Stenosis Classification
Summary
This paper introduces StenCE, a pretraining framework that uses cross-modal contrastive learning between ECG and X-ray angiography representations to detect severe coronary stenosis from ECGs, achieving high performance and enabling early diagnosis even in asymptomatic patients.
View Cached Full Text
Cached at: 06/03/26, 09:38 AM
# Cross-Modal Contrastive Learning of ECG and Angiography Representations for Severe Stenosis Classification
Source: [https://arxiv.org/html/2606.02605](https://arxiv.org/html/2606.02605)
11institutetext:1Chair for AI in Healthcare and Medicine, Technical University of Munich and TUM University Hospital, Munich, Germany
2Department of Computing, Imperial College London, UK
3Munich Center for Machine Learning \(MCML\), Munich, Germany
4Department of Internal Medicine, TUM University Hospital, Munich, Germany
11email:nikola\.cenikj@tum\.deÖzgün Turgut1[https://orcid.org/0009-0002-8704-0277](https://orcid.org/0009-0002-8704-0277)Alexander Müller4Alexander Steger4Jan Kehrer4Marcus Brugger4Daniel Rueckert1,2,3[https://orcid.org/0000-0002-5683-5889](https://orcid.org/0000-0002-5683-5889)Eimo Martens4[https://orcid.org/0000-0002-5801-0901](https://orcid.org/0000-0002-5801-0901)and Philip Müller1[https://orcid.org/0000-0001-8186-6479](https://orcid.org/0000-0001-8186-6479)
###### Abstract
Coronary artery stenosis is a common cardiovascular disease, with severe, untreated cases posing significant risks of heart attack\. Although coronary \(X\-ray\) angiograms remain the standard for stenosis diagnosis, they are invasive, time\- and resource\-intensive, and therefore only performed on patients with a high probability of disease based on symptoms and prior clinical tests\. However, a subset of patients, especially those without symptoms, may remain undiagnosed\. Detecting indications of stenosis from ECGs, which are fast, cheap, non\-invasive, and thus routinely acquired even in asymptomatic patients, would support early diagnosis\. However, as no reliable stenosis\-specific signal has been identified in ECGs, they can not currently be used for stenosis risk stratification\. To address this, we introduce*StenCE*, a pretraining framework, allowing stratification of patients based on features derived directly from ECGs\. Evaluations across varying stenosis severity thresholds and additional ECG disease classification tasks demonstrate consistent performance improvements across different ECG encoders, outperforming previous work\. The obtained models successfully detect signals for stenosis diagnosis in ECGs and are the first to achieve high performance in severe stenosis classification\. The source code is available at[https://github\.com/NikolaCenic/ecg\-stenosis\-cls](https://github.com/NikolaCenic/ecg-stenosis-cls)\.
## 1Introduction
Coronary artery stenosis is a common cardiovascular disease that gets progressively more severe over time\. As severe cases often lead to heart failure, early diagnosis is critical for improving the survival rate\. Diagnosis of severe stenosis is typically performed through coronary \(X\-ray\) angiograms, where multiple angiography views capture different segments of the coronary arteries\. However, X\-ray angiography is an invasive procedure with a small mortality risk and is typically reserved for patients with a high likelihood of stenosis based on symptoms and clinical tests\. Consequently, asymptomatic patients may remain undiagnosed while the disease progresses\. Identifying stenosis indicators from a fast, non\-invasive, and routinely\-acquired modality such as ECG would enable early diagnosis, even for asymptomatic patients\. However, despite being used for diagnosing different cardiovascular diseases, ECGs provide limited information for coronary stenosis, as patients with severe stenosis often have normal ECGs\[[1](https://arxiv.org/html/2606.02605#bib.bib29)\]\.
In this work, we aim to detect signals of stenosis in ECGs and thus enable the early identification of stenosis risks\. To achieve this, we develop a deep learning based stenosis classifier working only on ECG inputs\. To train this model, we rely on information distillation from an X\-ray angiogram encoder\. More precisely, as shown in[Fig\.˜1](https://arxiv.org/html/2606.02605#S1.F1), we employ multi\-modal contrastive learning between an ECG\-encoder and an angiography encoder trained for stenosis classification, followed by fine\-tuning of the pretrained ECG model\. Our contributions are as follows:
1. 1\.We propose*StenCE*, a contrastive pretraining framework that aligns ECG representations with features from a multi\-view angiography stenosis classification model, thereby enabling an ECG encoder to detect stenosis signals from ECGs alone\.
2. 2\.Our evaluation on clinical stenosis classification demonstrates an AUC of 0\.822 for the most severe cases, showcasing that severe stenosis can be identified solely from ECGs, enabling early stenosis detection\.
3. 3\.Evaluations among multiple stenosis severities as well as on additional cardiac abnormalities on the EchoNext dataset, further demonstrate the utility of our pretraining framework\.
Figure 1:Overview of the proposed approach: Multi\-view angiography and 12\-lead ECG from the same patient are encoded with transformer\-based modality\-specific encoders\. The ECG encoder is pretrained to extract features aligned with the frozen angiography encoder, trained for coronary stenosis classification\. The pretrained ECG encoder is then fine\-tuned for coronary stenosis and additional cardiac abnormalities diagnosed from ECGs\.
## 2Results
InLABEL:Table:stenosis, we report the results under both full fine\-tuning and linear probing for EchoingECG and OTIS backbones, pretrained with StenCE, referred to as EchoingECG\-StenCE and OTIS\-StenCE\. We compare them against OTIS, EchoingECG, as well as the other ECG baselines\.
Our models are capable of detecting clear diagnostic signals of severe stenosis from ECGs alone\.The full fine\-tuning results inLABEL:tab:main\_ffdemonstrate strong performance at the most severe stenosis threshold, achieving an AUC of 0\.822 for the \(=0\|=100=0\\;\|=100\) setting\. Our model is the first to achieve such high performance on ECG\-based stenosis classification, confirming its ability to extract a strong diagnostic signal directly from ECGs, thereby supporting early severe stenosis detection\.
StenCE pretraining yields performance improvement in the majority of tasks for both OTIS and EchoingECG\.In full fine\-tuning setting, OTIS\-StenCE significantly outperforms OTIS by 4% AUC on the severity threshold \(=0\|=100=0\\;\|=100\) , and insignificantly on \(=0\|≥90=0\\;\|\\geq 90\)\. For EchoingECG, fine\-tuning EchoingECG\-StenCE leads to an 11% AUC improvement on the \(=0\|=100=0\\;\|=100\) threshold, achieving the best overall performance among all models\. However, on the \(=0\|≥90=0\\;\|\\geq 90\) threshold, EchoingECG outperforms EchoingECG\-StenCE by 4% AUC\. On the EchoNext tasks, all models achieve comparable fine\-tuning results due to the large scale of the EchoNext dataset\. Nevertheless, for both OTIS and EchoingECG, the StenCE variants retain an advantage over their base versions\. The benefit of StenCE pretraining is even more pronounced in the linear probing setting \(LABEL:tab:main\_lin\_probe\), where for both OTIS and EchoingECG, the StenCE variants significantly outperform the base models on each task\. The largest gain is observed for EchoingECG\-StenCE, with a 17% AUC improvement over EchoingECG on the \(=0\|=100=0\\;\|=100\) threshold\. Similarly, OTIS\-StenCE improves over OTIS by 8% and 12% AUC on the \(=0\|=100=0\\;\|=100\) and \(=0\|≥90=0\\;\|\\geq 90\) thresholds, respectively\. Similarly, on EchoNext, the StenCE models in linear probing, outperform OTIS by 2% and EchoingECG by 7% AUC\.
Figure 2:Comparison of the fine\-tuned performance on different stenosis thresholds\. While, as expected, the detection performance drops when trying to identify less severe cases, our StenCE framework still enables the separation of severe cases from both healthy \(0% blockage\) and mild \(50% blockage\) cases\.Detection performance drops for less severe cases\.In[Fig\.˜2](https://arxiv.org/html/2606.02605#S2.F2), we compare the fine\-tuned performance on different stenosis severity thresholds\. Starting from an AUC of 0\.822 for the most severe cases, performance decreases to 0\.704 for the less severe \(=0\|≥90=0\\;\|\\geq 90\) threshold, and approaches random performance for the hardest to diagnose stenosis with severity of \(<70\|≥70<70\\;\|\\geq 70\)\. The performance at this threshold indicates that the model is not yet suitable for clinical use\. This trend is consistent with prior findings in\[[2](https://arxiv.org/html/2606.02605#bib.bib10)\]and\[[3](https://arxiv.org/html/2606.02605#bib.bib11)\]\. Although their models and datasets are not publicly available, which prevents direct comparison, our results indicate consistent improvements\. In\[[3](https://arxiv.org/html/2606.02605#bib.bib11)\], an AUC of 0\.57 is reported for the \(<70\|≥70<70\\;\|\\geq 70\) threshold, whereas our model surpasses this by more than 5%\. Similarly, in\[[2](https://arxiv.org/html/2606.02605#bib.bib10)\], the ECG\-only model achieves an AUC of 0\.654 for the \(≤50\|≥99\\leq 50\\;\|\\geq 99\) threshold, while our model attains 0\.687 on the same threshold using our dataset\.
Introducing stenosis supervision during StenCE pretraining improves performance\.[Table˜1](https://arxiv.org/html/2606.02605#S2.T1)shows an ablation analyzing the impact of architectural choices and the stenosis supervision in StenCE\. We evaluate using OTIS and EchoingECG as ECG encoder backbones, assessing performance on stenosis classification at the \(=0\|=100=0\\;\|=100\) severity threshold and on EchoNext\. For architecture, we consider a setting in which the angiography encoder is frozen and, instead of introducing aCLStoken, we use the study\-level representation\. To examine the effect of stenosis supervision, we vary the weightWStenW\_\{\\text\{Sten\}\}\(0, 0\.3, and 1\)\. Overall, incorporating stenosis supervision with a moderate weight \(WSten=0\.3W\_\{\\text\{Sten\}\}=0\.3\), while still keeping the pretraining primarily driven by the CLIP loss, consistently improves performance\. This configuration achieves the best results in three out of four evaluations, and only ranks second, with a 1% difference, in stenosis classification using OTIS\.
Table 1:Ablation study on architectural design and stenosis supervision in StenCE\. We investigate using a frozen angiography encoder and noCLStoken, as well as different values ofWStenW\_\{\\text\{Sten\}\}\. Experiments are conducted in linear probing using OTIS\- and EchoingECG\-based ECG encoders\. We report the AUC for severe stenosis and the EchoNext tasks\. The best performing model in each category isbold\. Results show that unfrozen angiography encoder with aCLStoken with a moderate stenosis supervision \(WStenW\_\{\\text\{Sten\}\}= 0\.3\) yields the best performance\.OTISEchoingECGAblation=0\|=100=0\\;\|=100EchoNext=0\|=100=0\\;\|=100EchoNextFrozen Angio Encoder & noCLStoken0\.6860\.6910\.7660\.727WSten\{\}\_\{\\text\{Sten\}\}= 00\.6650\.6920\.7250\.736WSten\{\}\_\{\\text\{Sten\}\}= 0\.30\.6760\.6950\.8160\.739WSten\{\}\_\{\\text\{Sten\}\}= 10\.6590\.6860\.7060\.738
## 3Discussion and Conclusion
Limitations\.This study has three main limitations\. First, each patient in our dataset underwent an X\-ray angiogram, performed due to stenosis symptoms\. While this is a selection bias that should be controlled for, such clinical studies might require applying X\-ray angiograms on healthy patients, exposing them to severe risks\. Second, our models rely only on the ECG signal, and unlike prior work that uses clinical risk factors\[[2](https://arxiv.org/html/2606.02605#bib.bib10)\], we do not integrate complementary data sources\. Incorporating such data may further improve the performance\.
Conclusion\.Current models for stenosis classification rely on using backbones pretrained on ECG\-specific tasks, limiting their ability to capture stenosis\-specific information\. We overcome this by introducing cross\-model pretraining with paired ECG–angiography data, enabling the ECG encoder to encode stenosis\-relevant features seen in angiography\. Although the obtained performances are still not sufficient for usage in clinical practice, our results demonstrate the benefit of cross\-modal pretraining and enable future developments towards preliminary assessment of coronary artery stenosis, supporting early diagnosis\.
### 3\.0\.1Acknowledgments\.
This study was approved by the Ethics Committee of TUM Klinikum Rechts der Isar \(reference number 2025\-395\-S\-CB, application dated July 13, 2025\)\.
## References
- \[1\]M\. H\. Crawford, C\. A\. Mendoza, R\. A\. O’Rourke, D\. H\. White, C\. A\. Boucher, and J\. Gorwit\(1978\)Limitations of continuous ambulatory electrocardiogram monitoring for detecting coronary artery disease\.Annals of Internal Medicine89\(1\),pp\. 1–5\.Note:PMID: 666154External Links:[Document](https://dx.doi.org/10.7326/0003-4819-89-1-1),https://doi\.org/10\.7326/0003\-4819\-89\-1\-1Cited by:[§1](https://arxiv.org/html/2606.02605#S1.p1.1)\.
- \[2\]Z\. Xue, S\. Geng, S\. Guo, G\. Mu, B\. Yu, P\. Wang, S\. Hu, D\. Zhang, W\. Xu, Y\. Liu, L\. Yang, H\. Tao, S\. Hong, and K\. Chen\(2024\-11\)Screening for severe coronary stenosis in patients with apparently normal electrocardiograms based on deep learning\.BMC Medical Informatics and Decision Making24,pp\.\.External Links:[Document](https://dx.doi.org/10.1186/s12911-024-02764-0)Cited by:[§2](https://arxiv.org/html/2606.02605#S2.p4.4),[§3](https://arxiv.org/html/2606.02605#S3.p1.1)\.
- \[3\]C\. Yeh, T\. Tsai, C\. Chen, Y\. Chou, C\. Mao, T\. Su, N\. Yang, C\. Lai, C\. Chen, H\. Sytwu, and T\. Tsai\(2025\)Artificial intelligence\-enhanced electrocardiography improves the detection of coronary artery disease\.Computational and Structural Biotechnology Journal27,pp\. 278–286\.External Links:ISSN 2001\-0370,[Document](https://dx.doi.org/10.1016/j.csbj.2024.12.032)Cited by:[§2](https://arxiv.org/html/2606.02605#S2.p4.4)\.Similar Articles
Machine learning prediction of obstructive coronary artery disease using opportunistic coronary calcium and epicardial fat assessments from CT calcium scoring scans
This paper presents a machine learning framework using CatBoost and SHAP to predict obstructive coronary artery disease from CT calcium scoring scans, achieving high accuracy by combining calcium-omics and epicardial fat features.
Quantitative coronary calcification analysis for prediction of myocardial ischemia using non-contrast CT calcium scoring
This paper presents a machine learning framework using non-contrast CT calcium scoring and calcium-omics features to predict myocardial ischemia, achieving a precision of 98.9% and an F1 score of 87.7%.
LLMs for Cardiovascular Risk Prediction from Structured Clinical Data
This paper presents a hybrid framework that combines structured clinical data with LLM-generated narratives for coronary artery disease prediction, achieving high fidelity in variable extraction and comparing ML models with LLM-based zero-shot and few-shot classification.
CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining
Introduces CGM-JEPA, a self-supervised pretraining framework for continuous glucose monitor data that improves cross-modal and cross-cohort performance through masked latent prediction and distributional objectives.
DeepArrhythmia: Segment-Contextualized ECG Arrhythmia Classification via Selective Evidence Acquisition
DeepArrhythmia is a multimodal framework for beat-level ECG arrhythmia classification that combines raw ECG signals and waveform images, using segment-level confidence to selectively acquire physiological evidence for improved accuracy.