Early Detection of Alzheimer's Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset

arXiv cs.AI Papers

Summary

This study develops an XGBoost classifier using SHAP explainability on eight clinical biomarkers from the ADNI dataset to achieve three-class Alzheimer's disease detection (normal cognition, MCI, AD), reaching a macro AUC of 0.982 and Cohen's kappa of 0.909 on the held-out test set. SHAP analysis identifies CDR Global as the dominant predictor for NC and MCI, while CDR-SB and MMSE together drive AD classification.

arXiv:2606.03995v1 Announce Type: cross Abstract: Background: Alzheimer's disease (AD) affects over 55 million people worldwide. Accurate, interpretable detection of normal cognition (NC), mild cognitive impairment (MCI), and AD from routine clinical assessments remains a critical unmet need. Methods: An XGBoost classifier was developed for three-class detection using eight clinical features from the Alzheimer's Disease Neuroimaging Initiative (ADNI): MMSE, CDR Global, CDR Sum of Boxes (CDR-SB), MoCA, FAQ, age, sex, and education. Hyperparameters were optimised using Optuna (50 trials); class imbalance was addressed with SMOTE. Performance was evaluated by macro AUC-ROC with 1,000-iteration bootstrap 95% confidence intervals, macro F1, balanced accuracy, and Cohen's kappa. SHAP values provided feature-level explainability. Results: The dataset comprised 1,641 baseline subjects (608 NC, 767 MCI, 266 AD). On five-fold cross-validation, mean macro AUC was 0.983 (SD 0.007), accuracy 0.944 (SD 0.006), and macro F1 0.929 (SD 0.008). On the held-out test set (n = 247), macro AUC was 0.982 (95% CI: 0.965--0.995), accuracy 0.943, balanced accuracy 0.932, macro F1 0.927, and Cohen's kappa 0.909. SHAP analysis identified CDR Global as the dominant predictor for NC and MCI, while CDR-SB and MMSE together drove AD classification. Conclusion: An explainable machine learning model trained on routine clinical assessments achieves near-perfect three-class Alzheimer's detection. SHAP analysis reveals clinically plausible, class-specific feature importance patterns supporting clinical validity. Future work will extend this framework with speech biomarkers for multimodal detection.
Original Article
View Cached Full Text

Cached at: 06/05/26, 02:11 AM

# Early Detection of Alzheimer’s Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset
Source: [https://arxiv.org/html/2606.03995](https://arxiv.org/html/2606.03995)
Afshan Hashmi

###### Abstract

Background:Alzheimer’s disease \(AD\) affects over 55 million people worldwide\. Accurate, interpretable detection of normal cognition \(NC\), mild cognitive impairment \(MCI\), and AD from routine clinical assessments remains a critical unmet need\.

Methods:An XGBoost classifier was developed for three\-class detection using eight clinical features from the Alzheimer’s Disease Neuroimaging Initiative \(ADNI\): MMSE, CDR Global, CDR Sum of Boxes \(CDR\-SB\), MoCA, FAQ, age, sex, and education\. Hyperparameters were optimised using Optuna \(50 trials\); class imbalance was addressed with SMOTE\. Performance was evaluated by macro AUC\-ROC with 1,000\-iteration bootstrap 95% confidence intervals, macro F1, balanced accuracy, and Cohen’s kappa\. SHAP values provided feature\-level explainability\.

Results:The dataset comprised 1,641 baseline subjects \(608 NC, 767 MCI, 266 AD\)\. On five\-fold cross\-validation, mean macro AUC was 0\.983 \(SD 0\.007\), accuracy 0\.944 \(SD 0\.006\), and macro F1 0\.929 \(SD 0\.008\)\. On the held\-out test set \(n=247n=247\), macro AUC was 0\.982 \(95% CI: 0\.965–0\.995\), accuracy 0\.943, balanced accuracy 0\.932, macro F1 0\.927, and Cohen’sκ\\kappa0\.909\. SHAP analysis identified CDR Global as the dominant predictor for NC and MCI, while CDR\-SB and MMSE together drove AD classification\.

Conclusion:An explainable machine learning model trained on routine clinical assessments achieves near\-perfect three\-class Alzheimer’s detection\. SHAP analysis reveals clinically plausible, class\-specific feature importance patterns supporting clinical validity\. Future work will extend this framework with speech biomarkers for multimodal detection\.

Keywords:Alzheimer’s disease; mild cognitive impairment; machine learning; XGBoost; explainability; SHAP; ADNI; early detection; cognitive assessment; gradient boosting

## 1\. Introduction

Alzheimer’s disease \(AD\) is the most prevalent neurodegenerative disorder worldwide, accounting for 60–70% of all dementia cases and affecting an estimated 55 million people globally, a figure projected to exceed 139 million by 2050\[[29](https://arxiv.org/html/2606.03995#bib.bib1),[4](https://arxiv.org/html/2606.03995#bib.bib2)\]\. Despite its devastating clinical and socioeconomic burden, no disease\-modifying pharmacological intervention has been approved, making early and accurate diagnosis a central clinical priority\[[14](https://arxiv.org/html/2606.03995#bib.bib3)\]\.

Mild cognitive impairment \(MCI\) represents the transitional zone between normal cognitive aging and manifest dementia, characterised by objective memory decline that does not interfere substantially with daily functioning\[[23](https://arxiv.org/html/2606.03995#bib.bib4)\]\. Approximately 15% of individuals diagnosed with MCI progress to AD annually, with cumulative conversion rates reaching 30–40% over three years\[[18](https://arxiv.org/html/2606.03995#bib.bib5)\]\. The accurate identification of MCI patients at high conversion risk is therefore one of the most consequential challenges in dementia research\.

Standardised clinical instruments, including the Mini\-Mental State Examination \(MMSE\)\[[13](https://arxiv.org/html/2606.03995#bib.bib9)\], Clinical Dementia Rating \(CDR\)\[[20](https://arxiv.org/html/2606.03995#bib.bib10)\], Montreal Cognitive Assessment \(MoCA\)\[[21](https://arxiv.org/html/2606.03995#bib.bib12)\], and Functional Activities Questionnaire \(FAQ\)\[[24](https://arxiv.org/html/2606.03995#bib.bib13)\], form the cornerstone of routine cognitive assessment\. These instruments are widely deployed in memory clinics globally, generate quantitative scores, and have demonstrated validity in characterising cognitive status\. However, their potential for automated, multi\-class simultaneous discrimination of NC, MCI, and AD within an interpretable machine learning framework remains incompletely explored\.

Gradient boosted decision tree models, particularly XGBoost\[[7](https://arxiv.org/html/2606.03995#bib.bib6)\], have demonstrated exceptional performance on heterogeneous tabular clinical data, outperforming deep learning approaches in several medical prediction benchmarks\. Critically, unlike neural network models, gradient boosted trees are compatible with SHAP \(SHapley Additive exPlanations\)\[[16](https://arxiv.org/html/2606.03995#bib.bib8)\], a theoretically grounded framework for computing feature\-level contributions to individual predictions\. This combination of strong performance and interpretability is essential for clinical adoption, where algorithmic transparency is a regulatory and ethical requirement\.

The Alzheimer’s Disease Neuroimaging Initiative \(ADNI\)\[[28](https://arxiv.org/html/2606.03995#bib.bib7)\]provides one of the largest, most comprehensively characterised multicentre longitudinal datasets in dementia research\. Several prior studies have used ADNI for binary AD\-versus\-control classification\[[5](https://arxiv.org/html/2606.03995#bib.bib19),[15](https://arxiv.org/html/2606.03995#bib.bib20)\]or MCI\-to\-AD conversion prediction\[[19](https://arxiv.org/html/2606.03995#bib.bib22),[26](https://arxiv.org/html/2606.03995#bib.bib23)\]\. However, few have addressed three\-class simultaneous discrimination of NC, MCI, and AD using exclusively clinical assessment features, with rigorous external validation and systematic per\-class explainability analysis\.

This study addresses this gap by developing and externally validating an explainable XGBoost classifier for three\-class Alzheimer’s detection using eight features from routine ADNI clinical assessments\. Optuna\-based Bayesian hyperparameter optimisation, SMOTE for class imbalance correction, and SHAP TreeExplainer for per\-class feature importance analysis were applied\. All performance metrics are reported with bootstrap confidence intervals and five\-fold cross\-validation in accordance with the updated TRIPOD\+AI reporting guidelines\[[9](https://arxiv.org/html/2606.03995#bib.bib17)\]\.

## 2\. Materials and Methods

### 2\.1 Dataset and Ethical Considerations

Data were obtained from the Alzheimer’s Disease Neuroimaging Initiative \(ADNI;[adni\.loni\.usc\.edu](https://arxiv.org/html/2606.03995v1/adni.loni.usc.edu)\)\. ADNI was launched in 2003 as a public\-private partnership, led by Principal Investigator Michael W\. Weiner MD, with the primary goal of testing whether serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD\. ADNI was approved by the institutional review boards of all participating sites; all participants provided written informed consent\. This study used exclusively de\-identified, publicly available data and was therefore exempt from additional local ethical review\.

### 2\.2 Subject Selection and Labelling

Baseline visits \(VISCODE = ‘bl’\) were used exclusively to simulate a first\-encounter clinical scenario and prevent temporal data leakage from longitudinal follow\-up\. Subjects were classified into three diagnostic groups based on the ADNI Diagnostic Summary: normal cognition \(NC; DIAGNOSIS = 1\), mild cognitive impairment \(MCI; DIAGNOSIS = 2\), and Alzheimer’s disease \(AD; DIAGNOSIS = 3\)\. This yielded 1,641 subjects: 608 NC \(37\.1%\), 767 MCI \(46\.7%\), and 266 AD \(16\.2%\)\.

### 2\.3 Feature Extraction

Eight features were extracted from five ADNI assessment tables at baseline: \(1\) MMSE total score\[[13](https://arxiv.org/html/2606.03995#bib.bib9)\]; \(2\) CDR Global rating\[[20](https://arxiv.org/html/2606.03995#bib.bib10)\]; \(3\) CDR Sum of Boxes \(CDR\-SB\)\[[22](https://arxiv.org/html/2606.03995#bib.bib11)\]; \(4\) MoCA total score\[[21](https://arxiv.org/html/2606.03995#bib.bib12)\]; \(5\) FAQ total score\[[24](https://arxiv.org/html/2606.03995#bib.bib13)\]; \(6\) age derived from year of birth; \(7\) sex \(binary encoded: male = 1, female = 0\); and \(8\) years of formal education\. MoCA was missing for 59% of subjects, FAQ for 1\.1%, and age for 0\.1%\. All missing values were imputed using median imputation fitted on the training set only to prevent data leakage\.

### 2\.4 Data Splitting and Class Imbalance

Subjects were divided into training \(70%\), validation \(15%\), and test \(15%\) sets using stratified random sampling to preserve class proportions \(random seed 42\)\. The held\-out test set \(n=247n=247\) was set aside prior to any model development and used only for final evaluation\. SMOTE\[[6](https://arxiv.org/html/2606.03995#bib.bib14)\]was applied exclusively to the training set, generating synthetic minority\-class examples by interpolating between existing samples \(k=5k=5neighbours\), yielding a balanced training set of 536 samples per class\.

### 2\.5 Model Development and Hyperparameter Optimisation

An XGBoost classifier\[[7](https://arxiv.org/html/2606.03995#bib.bib6)\]was trained with themulti:softprobobjective for probabilistic three\-class output\. Hyperparameter optimisation was performed using the Optuna framework\[[2](https://arxiv.org/html/2606.03995#bib.bib15)\]\(50 trials, Tree\-structured Parzen Estimator sampler\), minimising log loss on the validation set\. The search space included: number of estimators \(200–800\), maximum tree depth \(3–8\), learning rate \(0\.01–0\.2, log scale\), subsample ratio \(0\.6–1\.0\), column subsample ratio \(0\.6–1\.0\), and L1 regularisation coefficient \(0\.0001–10\.0, log scale\)\. The optimal configuration was:n​\_​estimators=544n\\\_\\text\{estimators\}=544,max\_depth=3\\text\{max\\\_depth\}=3,lr=0\.199\\text\{lr\}=0\.199,subsample=0\.941\\text\{subsample\}=0\.941,colsample=0\.637\\text\{colsample\}=0\.637,α=0\.134\\alpha=0\.134\. Early stopping with patience of 30 rounds was applied during final training\.

### 2\.6 Evaluation Metrics

Model performance was evaluated using: \(1\) macro AUC\-ROC with 1,000\-iteration stratified bootstrap 95% confidence intervals\[[10](https://arxiv.org/html/2606.03995#bib.bib33)\]; \(2\) overall accuracy; \(3\) balanced accuracy; \(4\) macro F1 score; and \(5\) Cohen’s kappa\[[8](https://arxiv.org/html/2606.03995#bib.bib32)\]\. Per\-class sensitivity and specificity were computed using one\-versus\-rest binarisation\. Five\-fold stratified cross\-validation was performed on the combined training and validation set \(n=1,394n=1\{,\}394\)\.

### 2\.7 Explainability Analysis

SHAP TreeExplainer\[[16](https://arxiv.org/html/2606.03995#bib.bib8)\]was applied to all test set predictions\. Mean absolute SHAP values were calculated per feature per diagnostic class, yielding a class\-specific feature importance ranking\. This analysis captures the differential contribution of each clinical feature to the model’s discrimination of each diagnostic category\.

### 2\.8 Reporting Standards

This manuscript follows the TRIPOD\+AI checklist\[[9](https://arxiv.org/html/2606.03995#bib.bib17)\]for transparent reporting\. All analyses were performed in Python 3\.12 using XGBoost v2\.x, SHAP v0\.44, scikit\-learn v1\.3, imbalanced\-learn v0\.11, and Optuna v3\.x\. The complete source code and analysis pipeline are available at[https://github\.com/\[to\-be\-added\-upon\-acceptance\]](https://github.com/%5Bto-be-added-upon-acceptance%5D)\.

## 3\. Results

### 3\.1 Dataset Characteristics

The final cohort comprised 1,641 subjects at baseline\. Table[1](https://arxiv.org/html/2606.03995#S3.T1)presents clinical and demographic characteristics by diagnostic group\. Clear and statistically consistent differences were observed across all features\. NC subjects had the highest MMSE scores \(29\.1±\\pm1\.0\) and the lowest CDR\-SB \(0\.0±\\pm0\.1\) and FAQ scores \(0\.1±\\pm0\.5\)\. MCI subjects showed intermediate profiles: MMSE 27\.5±\\pm1\.9, CDR\-SB 1\.5±\\pm0\.9, and FAQ 3\.2±\\pm4\.2\. AD subjects demonstrated markedly impaired performance: MMSE 23\.2±\\pm2\.3, CDR\-SB 4\.3±\\pm1\.7, and FAQ 13\.1±\\pm6\.9\. AD subjects were older on average \(73\.2±\\pm10\.1 years\) compared to NC \(66\.2±\\pm11\.2 years\)\. Figure[1](https://arxiv.org/html/2606.03995#S3.F1)illustrates class distribution and score distributions by diagnosis group\.

Table 1:Baseline clinical and demographic characteristics by diagnostic group \(mean±\\pmSD\)\.NC = Normal Cognition; MCI = Mild Cognitive Impairment; AD = Alzheimer’s Disease; MMSE = Mini\-Mental State Examination; CDR\-SB = Clinical Dementia Rating Sum of Boxes; MoCA = Montreal Cognitive Assessment; FAQ = Functional Activities Questionnaire\.

![Refer to caption](https://arxiv.org/html/2606.03995v1/fig1_dataset.png)Figure 1:Dataset characteristics\. \(A\) Class distribution \(NC = 608, MCI = 767, AD = 266\)\. \(B\) MMSE score distribution by diagnosis group\. \(C\) CDR Sum of Boxes distribution by diagnosis group\. Box plots show median, interquartile range, and 1\.5×\\timesIQR whiskers\.
### 3\.2 Model Performance

Table[2](https://arxiv.org/html/2606.03995#S3.T2)presents model performance and compares it against published studies\. On five\-fold cross\-validation, the model achieved a mean macro AUC of 0\.983 \(SD 0\.007\), ranging from 0\.975 to 0\.992 across folds\. Mean accuracy was 0\.944 \(SD 0\.006\) and macro F1 was 0\.929 \(SD 0\.008\)\.

On the held\-out test set \(n=247n=247\), the model achieved a macro AUC of 0\.982 \(95% CI: 0\.965–0\.995\), accuracy 0\.943, balanced accuracy 0\.932, macro F1 0\.927, and Cohen’sκ\\kappa0\.909\. Aκ\\kappaof 0\.909 corresponds to near\-perfect agreement beyond chance\[[8](https://arxiv.org/html/2606.03995#bib.bib32)\]\. Figure[2](https://arxiv.org/html/2606.03995#S3.F2)presents ROC curves and confusion matrices\. Figure[3](https://arxiv.org/html/2606.03995#S3.F3)shows fold\-wise cross\-validation performance\.

Table 2:Comparison with published studies\. AUC for this study is macro average \(one\-versus\-rest\) with bootstrap 95% CI based on 1,000 iterations\. Results for comparator studies taken from original publications\.BA = Balanced Accuracy; N/R = Not Reported; XAI = Explainability method; MRI = Magnetic Resonance Imaging; PET = Positron Emission Tomography; APOE = Apolipoprotein E gene; Grad\-CAM = Gradient\-weighted Class Activation Mapping\.

![Refer to caption](https://arxiv.org/html/2606.03995v1/fig2_evaluation.png)Figure 2:Model evaluation on held\-out test set \(n=247n=247\)\. \(A\) Multi\-class ROC curves with per\-class AUC\. Macro AUC = 0\.982 \(95% CI: 0\.965–0\.995\)\. \(B\) Confusion matrix \(absolute counts\)\. \(C\) Normalised confusion matrix\. NC = Normal Cognition; MCI = Mild Cognitive Impairment; AD = Alzheimer’s Disease\.![Refer to caption](https://arxiv.org/html/2606.03995v1/fig3_crossval.png)Figure 3:Five\-fold cross\-validation results\. Each point represents one fold\. Dashed horizontal line indicates the mean\. AUC = Area Under the ROC Curve\.
### 3\.3 SHAP Feature Importance

Table[3](https://arxiv.org/html/2606.03995#S3.T3)presents the top five SHAP features for each diagnostic class; Figure[4](https://arxiv.org/html/2606.03995#S3.F4)illustrates the full importance rankings\. Feature importance patterns were meaningfully differentiated across diagnostic classes\. For NC classification, CDR Global was overwhelmingly dominant \(mean\|SHAP\|=2\.218\|\\text\{SHAP\}\|=2\.218\), followed distantly by CDR\-SB \(0\.639\)\. For MCI classification, CDR Global remained the leading feature \(1\.417\), with MMSE score \(0\.463\) and CDR\-SB \(0\.322\) jointly contributing\. For AD classification, CDR\-SB \(1\.117\) and MMSE score \(0\.942\) were co\-dominant, with FAQ Total appearing consistently among the top predictors for all three classes\.

Table 3:Top five SHAP features by diagnostic class \(mean absolute SHAP value in parentheses\)\.SHAP = SHapley Additive exPlanations; CDR = Clinical Dementia Rating; CDR\-SB = CDR Sum of Boxes; MMSE = Mini\-Mental State Examination; FAQ = Functional Activities Questionnaire; MoCA = Montreal Cognitive Assessment\.

![Refer to caption](https://arxiv.org/html/2606.03995v1/fig4_shap.png)Figure 4:SHAP feature importance by diagnostic class\. Bar length represents mean absolute SHAP value across all test set subjects\. Higher values indicate greater importance\. Features are ranked in descending order\.

## 4\. Discussion

This study demonstrates that an explainable XGBoost classifier trained on eight routine clinical assessments can achieve near\-perfect three\-class discrimination of NC, MCI, and AD, with a macro AUC of 0\.982 on external validation and a Cohen’sκ\\kappaof 0\.909\. The closest methodologically comparable study,\[[30](https://arxiv.org/html/2606.03995#bib.bib34)\], also applied XGBoost\-SHAP to the same three\-class ADNI task but required neuroimaging biomarkers and APOE genetic data in addition to clinical assessments, achieving a lower AUC of 0\.91 and accuracy of 87\.6%\. The present study surpasses that performance using only eight routine clinical features, suggesting that neuroimaging and genetic data may provide limited incremental discriminative value when clinical assessment scores are comprehensively captured and optimally modelled\.

\[[25](https://arxiv.org/html/2606.03995#bib.bib35)\]achieved comparable accuracy \(93\.9%\) using gradient boosting on clinical and behavioural features, though that study used a Kaggle\-sourced dataset of unknown provenance rather than the standardised ADNI cohort\.\[[27](https://arxiv.org/html/2606.03995#bib.bib36)\]achieved a balanced accuracy of 87\.5% on the same three\-class ADNI task using MRI volumetric measurements and genetic data; the present study exceeds this \(balanced accuracy 93\.2%\) without any neuroimaging requirement, further supporting the clinical utility of assessment\-only approaches\. Deep learning models applied to MRI and PET scans have achieved comparable or higher binary classification accuracy\[[11](https://arxiv.org/html/2606.03995#bib.bib18)\], but require expensive neuroimaging infrastructure unavailable in most clinical settings and provide limited interpretability compared to SHAP\-based explanations\.

The dominance of CDR Global in SHAP feature importance for both NC and MCI classification reflects its well\-established role as the primary clinical staging instrument for dementia\. The transition to CDR\-SB dominance in AD classification aligns with established evidence that CDR\-SB provides greater sensitivity for monitoring decline in moderate\-to\-severe AD\[[22](https://arxiv.org/html/2606.03995#bib.bib11)\], as it captures the cumulative burden across all six CDR domains\. The consistent appearance of FAQ Total among top predictors for all three classes underscores the clinical importance of functional assessment alongside cognitive testing, a finding aligned with current diagnostic frameworks that require functional impairment for an AD diagnosis\[[17](https://arxiv.org/html/2606.03995#bib.bib28)\]\.

The differential SHAP importance patterns across diagnostic classes have directly actionable clinical implications\. For screening in primary care settings where time is limited, CDR Global alone may be sufficient to identify patients requiring specialist referral\. For specialist memory clinics, the combination of CDR\-SB and MMSE provides the highest discriminative value for confirming AD, while the subtler MCI profile, characterised by borderline CDR Global with mild MMSE reduction, may benefit from additional cognitive testing including MoCA\. This staged assessment strategy could optimise clinical resource allocation, particularly in lower\-resource settings where comprehensive neuropsychological batteries are not readily available\.

MoCA had a 59% missingness rate in this cohort, attributable to its introduction in the ADNIGO phase and later\. Despite this, MoCA appeared among the top five predictors for NC classification, suggesting it provides incremental information when available\. The robustness of model performance despite substantial MoCA missingness, managed through median imputation, suggests that the remaining seven features capture sufficient information for accurate classification\.

Several limitations of the present study warrant acknowledgement\. First, the ADNI cohort is predominantly non\-Hispanic White and highly educated \(mean 15\.0–16\.5 years\), which may limit generalisability to ethnically diverse populations, including Middle Eastern and South Asian communities where genetic risk profiles and educational norms differ substantially\[[12](https://arxiv.org/html/2606.03995#bib.bib31)\]\. Validation in diverse external cohorts is a priority\. Second, the analysis is cross\-sectional using baseline data only; incorporating longitudinal cognitive trajectories may further improve MCI conversion prediction\. Third, no comparison was made against commercial AI tools or structured clinical algorithms, which would be informative for contextualising the model’s diagnostic utility\. Fourth, while SHAP values provide rigorous post\-hoc explanations, they reflect statistical associations rather than causal mechanisms\.

Future work will extend this pipeline in two directions\. First, speech\-language biomarkers will be incorporated, extracted from spontaneous speech recordings including acoustic features \(MFCCs, prosody, pause patterns\) and linguistic features \(type\-token ratio, lexical richness, syntactic complexity\), using the DementiaBank Pitt Corpus and ADReSS challenge datasets\. A multimodal attention fusion model combining clinical assessment features with speech features could be deployed via a telephone call, enabling screening in settings where neuropsychological testing is inaccessible\. Second, validation on a Saudi Arabian clinical cohort is planned, where both incidence of dementia and AI research infrastructure are rapidly growing, addressing a critical equity gap in global dementia AI literature\.

## 5\. Conclusion

An explainable XGBoost classifier was developed and externally validated for three\-class early detection of Alzheimer’s disease using routine clinical assessments from the ADNI dataset\. The model achieved a macro AUC of 0\.982 \(95% CI: 0\.965–0\.995\) and Cohen’sκ\\kappaof 0\.909, with SHAP analysis confirming clinically meaningful, class\-specific feature importance patterns\. CDR Global dominated NC and MCI classification, while CDR\-SB and MMSE together drove AD detection, findings that align with established diagnostic criteria and support the model’s clinical validity\. These results demonstrate that highly accurate and interpretable Alzheimer’s detection is achievable from assessments routinely performed in memory clinics worldwide, supporting the feasibility of scalable AI\-assisted screening, particularly in resource\-limited settings\.

## Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative \(ADNI; National Institutes of Health Grant U01 AG024904\) and DOD ADNI \(Department of Defense award number W81XWH\-12\-2\-0012\)\. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie; Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc\.; Biogen; Bristol\-Myers Squibb Company; CereSpir, Inc\.; Cognitect; Eisai Inc\.; Elan Pharmaceuticals, Inc\.; Eli Lilly and Company; EuroImmun; F\. Hoffmann\-La Roche Ltd and its affiliated company Genentech, Inc\.; Fujirebio; GE Healthcare; IXICO Ltd\.; Janssen Alzheimer Immunotherapy Research & Development, LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Lumosity; Lundbeck; Merck & Co\., Inc\.; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc\. and Janssen Alzheimer Immunotherapy Research & Development, LLC; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics\.

## Conflicts of Interest

The author declares no conflicts of interest\. The funders had no role in the design of the study, data collection and analysis, decision to publish, or preparation of the manuscript\.

## Data Availability

## References

- \[1\]T\. Akanet al\.\(2025\)Machine learning approaches for predicting progression to Alzheimer’s disease in patients with mild cognitive impairment\.J Med Biol Eng\.External Links:[Document](https://dx.doi.org/10.1007/s40846-024-00918-z)Cited by:[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.6.5.1.1.1)\.
- \[2\]T\. Akiba, S\. Sano, T\. Yanase,et al\.\(2019\)Optuna: a next\-generation hyperparameter optimization framework\.InProceedings of the 25th ACM SIGKDD International Conference,pp\. 2623–2631\.External Links:[Document](https://dx.doi.org/10.1145/3292500.3330701)Cited by:[§2\.5](https://arxiv.org/html/2606.03995#S2.SS5.p1.6)\.
- \[3\]A\. S\. Alatrany, W\. Khan, A\. Hussain,et al\.\(2024\)An explainable machine learning approach for Alzheimer’s disease classification\.Sci Rep14,pp\. 2637\.External Links:[Document](https://dx.doi.org/10.1038/s41598-024-51985-w)Cited by:[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.7.6.1.1.1)\.
- \[4\]Alzheimer’s Disease International\(2023\)World alzheimer report 2023: reducing dementia risk\.Technical reportAlzheimer’s Disease International,London\.Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p1.1)\.
- \[5\]G\. Battineni, N\. Chintalapudi, and F\. Amenta\(2019\)Machine learning in medicine: performance calculation of dementia prediction by support vector machines \(SVM\)\.Inform Med Unlocked16,pp\. 100200\.External Links:[Document](https://dx.doi.org/10.1016/j.imu.2019.100200)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p5.1),[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.11.10.1.1.1)\.
- \[6\]N\. V\. Chawla, K\. W\. Bowyer, L\. O\. Hall, and W\. P\. Kegelmeyer\(2002\)SMOTE: synthetic minority over\-sampling technique\.J Artif Intell Res16,pp\. 321–357\.External Links:[Document](https://dx.doi.org/10.1613/jair.953)Cited by:[§2\.4](https://arxiv.org/html/2606.03995#S2.SS4.p1.2)\.
- \[7\]T\. Chen and C\. Guestrin\(2016\)XGBoost: a scalable tree boosting system\.InProceedings of the 22nd ACM SIGKDD International Conference,pp\. 785–794\.External Links:[Document](https://dx.doi.org/10.1145/2939672.2939785)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p4.1),[§2\.5](https://arxiv.org/html/2606.03995#S2.SS5.p1.6)\.
- \[8\]J\. Cohen\(1960\)A coefficient of agreement for nominal scales\.Educ Psychol Meas20\(1\),pp\. 37–46\.External Links:[Document](https://dx.doi.org/10.1177/001316446002000104)Cited by:[§2\.6](https://arxiv.org/html/2606.03995#S2.SS6.p1.1),[§3\.2](https://arxiv.org/html/2606.03995#S3.SS2.p2.3)\.
- \[9\]G\. S\. Collins, K\. G\. M\. Moons, P\. Dhiman,et al\.\(2024\)TRIPOD\+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning\.BMJ385,pp\. e078378\.External Links:[Document](https://dx.doi.org/10.1136/bmj-2023-078378)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p6.1),[§2\.8](https://arxiv.org/html/2606.03995#S2.SS8.p1.1)\.
- \[10\]E\. R\. DeLong, D\. M\. DeLong, and D\. L\. Clarke\-Pearson\(1988\)Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach\.Biometrics44\(3\),pp\. 837–845\.External Links:[Document](https://dx.doi.org/10.2307/2531595)Cited by:[§2\.6](https://arxiv.org/html/2606.03995#S2.SS6.p1.1)\.
- \[11\]Y\. Ding, J\. H\. Sohn, M\. G\. Kawczynski,et al\.\(2019\)A deep learning model to predict a diagnosis of Alzheimer disease by using 18F\-FDG PET of the brain\.Radiology290\(2\),pp\. 456–464\.External Links:[Document](https://dx.doi.org/10.1148/radiol.2018180958)Cited by:[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.10.9.1.1.1),[§4](https://arxiv.org/html/2606.03995#S4.p2.1)\.
- \[12\]Y\. H\. El\-Hayek, R\. E\. Wiley, C\. P\. Khoury,et al\.\(2019\)Tip of the dementia iceberg in China, India, and Latin America\.Front Neurol10,pp\. 1218\.External Links:[Document](https://dx.doi.org/10.3389/fneur.2019.01218)Cited by:[§4](https://arxiv.org/html/2606.03995#S4.p6.1)\.
- \[13\]M\. F\. Folstein, S\. E\. Folstein, and P\. R\. McHugh\(1975\)Mini\-mental state: a practical method for grading the cognitive state of patients for the clinician\.J Psychiatr Res12\(3\),pp\. 189–198\.External Links:[Document](https://dx.doi.org/10.1016/0022-3956%2875%2990026-6)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.03995#S2.SS3.p1.1)\.
- \[14\]C\. R\. Jack, D\. A\. Bennett, K\. Blennow,et al\.\(2018\)NIA\-AA research framework: toward a biological definition of Alzheimer’s disease\.Alzheimers Dement14\(4\),pp\. 535–562\.External Links:[Document](https://dx.doi.org/10.1016/j.jalz.2018.02.018)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p1.1)\.
- \[15\]C\. Kavitha, V\. Mani, S\. R\. Srividhya,et al\.\(2022\)Early\-stage Alzheimer’s disease prediction using machine learning models\.Front Public Health10,pp\. 853294\.External Links:[Document](https://dx.doi.org/10.3389/fpubh.2022.853294)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p5.1),[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.9.8.1.1.1)\.
- \[16\]S\. M\. Lundberg and S\. Lee\(2017\)A unified approach to interpreting model predictions\.InAdvances in Neural Information Processing Systems,Vol\.30,pp\. 4765–4774\.Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p4.1),[§2\.7](https://arxiv.org/html/2606.03995#S2.SS7.p1.1)\.
- \[17\]G\. M\. McKhann, D\. S\. Knopman, H\. Chertkow,et al\.\(2011\)The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging\-Alzheimer’s Association workgroups\.Alzheimers Dement7\(3\),pp\. 263–269\.External Links:[Document](https://dx.doi.org/10.1016/j.jalz.2011.03.005)Cited by:[§4](https://arxiv.org/html/2606.03995#S4.p3.1)\.
- \[18\]A\. J\. Mitchell and M\. Shiri\-Feshki\(2009\)Rate of progression of mild cognitive impairment to dementia: meta\-analysis of 41 robust inception cohort studies\.Acta Psychiatr Scand119\(4\),pp\. 252–265\.External Links:[Document](https://dx.doi.org/10.1111/j.1600-0447.2008.01326.x)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p2.1)\.
- \[19\]E\. Moradi, A\. Pepe, C\. Gaser,et al\.\(2015\)Machine learning framework for early MRI\-based Alzheimer’s conversion prediction in MCI subjects\.NeuroImage104,pp\. 398–412\.External Links:[Document](https://dx.doi.org/10.1016/j.neuroimage.2014.10.002)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p5.1)\.
- \[20\]J\. C\. Morris\(1993\)The clinical dementia rating \(CDR\): current version and scoring rules\.Neurology43\(11\),pp\. 2412–2414\.External Links:[Document](https://dx.doi.org/10.1212/WNL.43.11.2412-a)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.03995#S2.SS3.p1.1)\.
- \[21\]Z\. S\. Nasreddine, N\. A\. Phillips, V\. Bédirian,et al\.\(2005\)The Montreal cognitive assessment, MoCA: a brief screening tool for mild cognitive impairment\.J Am Geriatr Soc53\(4\),pp\. 695–699\.External Links:[Document](https://dx.doi.org/10.1111/j.1532-5415.2005.53221.x)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.03995#S2.SS3.p1.1)\.
- \[22\]S\. E\. O’Bryant, S\. C\. Waring, C\. M\. Cullum,et al\.\(2008\)Staging dementia using clinical dementia rating scale sum of boxes scores\.Arch Neurol65\(8\),pp\. 1091–1095\.External Links:[Document](https://dx.doi.org/10.1001/archneur.65.8.1091)Cited by:[§2\.3](https://arxiv.org/html/2606.03995#S2.SS3.p1.1),[§4](https://arxiv.org/html/2606.03995#S4.p3.1)\.
- \[23\]R\. C\. Petersen, R\. O\. Roberts, D\. S\. Knopman,et al\.\(2009\)Mild cognitive impairment: ten years later\.Arch Neurol66\(12\),pp\. 1447–1455\.External Links:[Document](https://dx.doi.org/10.1001/archneurol.2009.266)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p2.1)\.
- \[24\]R\. I\. Pfeffer, T\. T\. Kurosaki, C\. H\. Harrah,et al\.\(1982\)Measurement of functional activities in older adults in the community\.J Gerontol37\(3\),pp\. 323–329\.External Links:[Document](https://dx.doi.org/10.1093/geronj/37.3.323)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p3.1),[§2\.3](https://arxiv.org/html/2606.03995#S2.SS3.p1.1)\.
- \[25\]V\. Rashmiet al\.\(2025\)Development of an explainable machine learning model for Alzheimer’s disease prediction using clinical and behavioural features\.MethodsX15\.External Links:[Document](https://dx.doi.org/10.1016/j.mex.2025.103336)Cited by:[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.4.3.1.1.1),[§4](https://arxiv.org/html/2606.03995#S4.p2.1)\.
- \[26\]K\. Ritter, J\. Schumacher, M\. Weygandt,et al\.\(2015\)Multimodal prediction of conversion from mild cognitive impairment to Alzheimer’s dementia\.Hum Brain Mapp36\(4\),pp\. 1522–1534\.External Links:[Document](https://dx.doi.org/10.1002/hbm.22719)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p5.1),[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.8.7.1.1.1)\.
- \[27\]M\. E\. Vlontzou, M\. Athanasiou, K\. V\. Dalakleidi,et al\.\(2025\)A comprehensive interpretable machine learning framework for mild cognitive impairment and Alzheimer’s disease diagnosis\.Sci Rep15,pp\. 8410\.External Links:[Document](https://dx.doi.org/10.1038/s41598-025-92577-6)Cited by:[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.5.4.1.1.1),[§4](https://arxiv.org/html/2606.03995#S4.p2.1)\.
- \[28\]M\. W\. Weiner, D\. P\. Veitch, P\. S\. Aisen,et al\.\(2017\)The Alzheimer’s Disease Neuroimaging Initiative 3: continued innovation for clinical trial improvement\.Alzheimers Dement13\(5\),pp\. 561–571\.External Links:[Document](https://dx.doi.org/10.1016/j.jalz.2016.10.006)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p5.1)\.
- \[29\]World Health Organization\(2023\)Dementia: key facts\.Note:[https://www\.who\.int/news\-room/fact\-sheets/detail/dementia](https://www.who.int/news-room/fact-sheets/detail/dementia)Cited by:[§1](https://arxiv.org/html/2606.03995#S1.p1.1)\.
- \[30\]F\. Yi, H\. Yang, D\. Chen,et al\.\(2023\)XGBoost\-SHAP\-based interpretable diagnostic framework for Alzheimer’s disease\.BMC Med Inform Decis Mak23\(1\),pp\. 137\.External Links:[Document](https://dx.doi.org/10.1186/s12911-023-02238-9)Cited by:[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.3.2.1.1.1),[§4](https://arxiv.org/html/2606.03995#S4.p1.1)\.
- \[31\]D\. Zhang, D\. Shen, and Alzheimer’s Disease Neuroimaging Initiative\(2012\)Multi\-modal multi\-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease\.NeuroImage59\(2\),pp\. 895–907\.External Links:[Document](https://dx.doi.org/10.1016/j.neuroimage.2011.09.069)Cited by:[Table 2](https://arxiv.org/html/2606.03995#S3.T2.4.12.11.1.1.1)\.

Similar Articles

MIT FINGERS-7B: First Multi-Omics AI Model for Alzheimer’s Prevention

Reddit r/singularity

MIT released FINGERS-7B, a 7-billion-parameter multi-omics foundation model trained on data from 30,000 individuals to predict Alzheimer's risk years in advance. The model is accessible via the AD Workbench and is accompanied by a research paper on OpenReview.