Reconstructing and forecasting disease trajectories of patients with Alzheimer's disease using routine data in resource-constrained settings
Summary
This paper introduces GNOVA, a GRU-Neural ODE Variational Autoencoder framework for reconstructing and forecasting Alzheimer's disease cognitive trajectories from routine clinical data without expensive neuroimaging or biomarkers, achieving low error and uncertainty estimation on the ADNI dataset.
View Cached Full Text
Cached at: 06/09/26, 08:52 AM
# Reconstructing and forecasting disease trajectories of patients with Alzheimer’s disease using routine data in resource-constrained settings Source: [https://arxiv.org/html/2606.07798](https://arxiv.org/html/2606.07798) Atri Chatterjee[chatterjee\.atri@outlook\.com](https://arxiv.org/html/2606.07798v1/mailto:[email protected])Sitikantha Roy[sroy@am\.iitd\.ac\.in](https://arxiv.org/html/2606.07798v1/mailto:[email protected])Yardi School of Artificial Intelligence \(ScAI\), Indian Institute of Technology Delhi, Hauz Khas, Delhi, 110016, IndiaDepartment of Neurology, Vardhman Mahavir Medical College and Safdarjung Hospital, Delhi, 110029, IndiaDepartment of Applied Mechanics, Indian Institute of Technology Delhi, Hauz Khas, Delhi, 110016, India ###### Abstract Background and objective:Alzheimer’s disease is a progressive neurodegenerative disorder, and its progression varies substantially across patients\. Patients often visit clinicians irregularly, making it difficult for clinicians to have a complete picture of a patient’s disease trajectory\. Existing work aims to forecast patients’ future cognitive state, with minimal focus on reconstructing the state from past visits\. Furthermore, the quantification of predictive uncertainty remains relatively underexplored, and most existing methods rely on costly modalities such as MRI, PET, and cerebrospinal fluid biomarkers, limiting their practical deployment in resource\-limited settings\. In this research, our primary objectives are: First, bidirectional prediction of cognitive scores from irregular visits to present the complete disease trajectory\. Second, to enable interpolation and extrapolation capabilities to assist clinicians in informed prognostic decision making, and third, to provide a well\-calibrated uncertainty estimate for all predictions, and finally, to achieve the objectives using the modalities available during routine visits\. Methods:We propose a unified framework, GNOVA: A GRU\-Neural ODE Variational Autoencoder\. The architecture combines a Gated Recurrent Unit encoder and a Neural ODE decoder within a variational autoencoder framework\. In our work, we forecast the CDR\-SB and MMSE Scores\. The GRU encoder allows for any number of inputs at any time point\. The Neural\-ODE decoder performs continuous estimation, allowing interpolation and extrapolation at any desired time point\. The Variational autoencoder allows for uncertainty estimation in predictions\. Results:We worked with 1,727 patients from the Alzheimer’s Disease Neuroimaging Initiative \(ADNI\) dataset over 10 years; the model achieved mean absolute errors of 1\.35 and 2\.28 for CDR\-SB and MMSE scores, respectively, without requiring any neuroimaging or biomarker data\. The model captured an average of 85% of true CDR\-SB values within predicted confidence intervals, with reduced coverage at longer extrapolation horizons\. Feature\-ablation studies revealed that age, BMI, and APOE4 status were strong predictors\. Conclusion:The proposed framework enables the reconstruction of incomplete patient histories and the anticipation of future cognitive states, thereby supporting individualized prognostic decision\-making\. Such work can encourage the development of models and frameworks that can be deployed in resource\-constrained settings\. ###### keywords: Alzheimer’s disease progression, disease trajectory modelling, neural ordinary differential equations, variational autoencoder, uncertainty quantification, resource\-constrained settings ## 1Introduction Alzheimer’s disease \(AD\) is a progressive neurodegenerative disease that damages brain cells over time, causing an irreversible decline in cognitive abilitiesEvanset al\.\[[2019](https://arxiv.org/html/2606.07798#bib.bib33)\]\. According to the World Health Organization, in 2021, approximately 57 million people were suffering from dementia worldwide, with 10 million new cases each year\. Globally, AD is the seventh leading cause of death and costs around $1\.3 trillion annually, accounting for around 70% of all dementia casesWorld Health Organization \[[2024](https://arxiv.org/html/2606.07798#bib.bib32)\]\. Currently incurable, its progression can only be delayed through timely clinical intervention\. Initially, AD could be detected only at the autopsy\. However, advances in technology have enabled clinicians to rely on modalities such as neuroimaging, fluid tests, and cognitive assessments to diagnose and monitor ADBlennow \[[2017](https://arxiv.org/html/2606.07798#bib.bib31)\]\. AD is characterized by abnormalβ−\\beta\-amyloid \(AAβ\\beta42\) plaque accumulation and tau protein tangles within the brainZhanget al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib39)\]\. Positron Emission Tomography \(PET\) imaging and cerebrospinal fluid \(CSF\) amyloid\-β\\beta42 \(AAβ\\beta42\) levels can detect these hallmarks with high sensitivitySchaapet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib30)\], Loweet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib29)\]\. However, these tests are costly, invasive, and require skilled personnel, making routine longitudinal follow\-up impractical in most clinical settings\. As a result, clinicians often rely primarily on cognitive assessment tools to track disease progression\. In a recent study, Schaap et al\. found a strong correlation between cognitive scores and amyloid positivity\. Results showed that MMSE scores begin declining approximately 0\.2 years after a patient becomes amyloid\-positive, while CDR\-SB scores worsen around 1\.4 years after onsetSchaapet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib30)\]\. This underscores the clinical value of tracking cognitive scores to track the disease progression\. In this study, we aim to predict the trajectory of two widely used cognitive assessments: the Clinical Dementia Rating\-Sum of Boxes \(CDR\-SB, range 0 to 18\), which integrates cognitive and functional domainsTzenget al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib28)\], and the Mini\-Mental State Examination \(MMSE, range 0 to 30\), which evaluates orientation, attention, memory, language, and visuospatial skillsFolsteinet al\.\[[1975](https://arxiv.org/html/2606.07798#bib.bib27)\]\. AD progression varies substantially across patients\. Among mild cognitive impairment \(MCI\) patients, Mouchet et al\. found that approximately 21% experience rapid progression, 22% experience slow progression, and 57% experience no progression over four yearsMouchetet al\.\[[2021](https://arxiv.org/html/2606.07798#bib.bib26)\]\. This study highlights the uncertainty in disease progression and the importance of continuously tracking scores to understand its nature\. However, in practical scenarios, clinicians often have a weak historical context of the disease due to the patient’s irregular visits or visits only when the condition worsens\. This makes it hard for clinicians to distinguish between patients with varied progression rates, thus delaying their treatment decisions\. This is precisely the scenario that motivates the need for bidirectional trajectory modeling: a model that can not only forecast future cognitive states but also retrospectively reconstruct past ones from sparse, irregularly spaced observations\. In the past decade, advancements in artificial intelligence have led researchers to develop computational tools for modeling AD progression\. Initial efforts relied on traditional machine learning algorithms — support vector regressionLeiet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib18)\], tree\-based methodsJianget al\.\[[2021](https://arxiv.org/html/2606.07798#bib.bib22)\], gradient boosting methodsDevanarayanet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib11)\], Tabarestaniet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib24)\], and Gaussian processes \(GPs\)Petersonet al\.\[[2018](https://arxiv.org/html/2606.07798#bib.bib19)\], Puriet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib21)\]\. The introduction of deep learning enabled researchers to incorporate neuroimaging modalities and capture non\-linear relationships for forecasting cognitive scoresLiuet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib6)\], Moraret al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib23)\], Yuanet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib14)\]\. However, these methods typically forecast over short time horizons of three to five years and do not capture the inherent temporal dependencies in longitudinal data\. Although GPs can model a 10\-year trajectory, they scale poorly, with existing studies limited to around 100 patientsPetersonet al\.\[[2018](https://arxiv.org/html/2606.07798#bib.bib19)\], Puriet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib21)\]\. Recurrent neural networks \(RNNs\) and their gated variants — long short\-term memory \(LSTM\) and gated recurrent unit \(GRU\)Goodfellowet al\.\[[2016](https://arxiv.org/html/2606.07798#bib.bib36)\], Choet al\.\[[2014](https://arxiv.org/html/2606.07798#bib.bib34)\]naturally address temporal modeling of sequential data\. Researchers have employed RNN variants to improve prediction performance and imputation capabilities: MinimalRNNNguyenet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib10)\]and DeepRNNJunget al\.\[[2021](https://arxiv.org/html/2606.07798#bib.bib16)\]for individualized predictions, and Time\_LSTMLianget al\.\[[2021](https://arxiv.org/html/2606.07798#bib.bib5)\]for multi\-task learning\. Mukherji et al\. and Morar et al\. used many\-to\-one architectures for long\-term prediction and effective temporal dependency captureMukherjiet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib9)\], Moraret al\.\[[2023](https://arxiv.org/html/2606.07798#bib.bib7)\], while encoder\-decoder architecturesChoet al\.\[[2014](https://arxiv.org/html/2606.07798#bib.bib34)\]were later used to address variable sequence lengths in classification tasksPoonamet al\.\[[2023](https://arxiv.org/html/2606.07798#bib.bib25),[2024](https://arxiv.org/html/2606.07798#bib.bib12)\]\. Das et al\. found in their experiments that GRUs performed slightly better at predicting ADAS\-13 and CDR\-SB scores than RNNs and LSTMsDaset al\.\[[2025](https://arxiv.org/html/2606.07798#bib.bib41)\]\. Also, GRUs have comparable performance and are faster than LSTMsGoodfellowet al\.\[[2016](https://arxiv.org/html/2606.07798#bib.bib36)\], Chunget al\.\[[2014](https://arxiv.org/html/2606.07798#bib.bib42)\]\. Statistical approaches such as linear mixed effects models naturally accommodate repeated measures and irregular visit schedules, but they are constrained to point estimates, assume linearity, and cannot construct bidirectional trajectories or interpolate predictions at arbitrary unobserved time points\. Recently, a promising approach to handling irregularly sampled time\-series data has emerged through neural ordinary differential equations \(Neural ODEs\)\. Unlike traditional discrete models that process data at fixed intervals, Neural ODEs model system dynamics continuouslyChenet al\.\[[2019](https://arxiv.org/html/2606.07798#bib.bib4)\], allowing cognitive score progression to be represented as a smooth trajectory that can be evaluated at any desired time point\. This is particularly valuable for AD, where progression is inherently continuous but clinical observations occur at variable and often widely spaced intervals\. Jeong et al\. demonstrated this potential using a GRU\-ODE architecture for modeling the trajectory of cognitive scoresJeonget al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib3)\]\. While their approach captured continuous dynamics effectively, it remained deterministic and did not quantify prediction uncertainty\. GRU\-based models capture temporal dependencies well, but operate only at discrete observed time points\. Neural ODE\-based models enable continuous dynamics but lack principled uncertainty estimation\. Variational autoencoders alone provide probabilistic outputs but do not model continuous temporal evolution\. To our best knowledge, no existing approach operates solely on routine clinical data and combines all three capabilities within a unified framework to predict disease trajectory for AD\. In this work, we introduce GNOVA, which addresses these gaps by combining the strengths of GRU for sequential temporal encoding, Neural ODE for continuous latent trajectory modeling, a variational autoencoder for principled uncertainty estimation, and a dedicated encoder for static covariates\. The resulting model enables clinicians to reconstruct incomplete patient histories, forecast future cognitive states at any desired time point, and obtain calibrated confidence intervals around all predictions — using only routine demographic and clinical data collected during standard visits, without requiring neuroimaging or biomarker testing\. This makes the framework deployable in resource\-constrained settings where advanced diagnostic infrastructure is unavailable\. ## 2Methods and materials ### 2\.1Data #### 2\.1\.1Dataset and Participants In our work, we used the Alzheimer’s Disease Neuroimaging Initiative \(ADNI\) datasetMuelleret al\.\[[2005](https://arxiv.org/html/2606.07798#bib.bib2)\]\. It is accessible at[https://adni\.loni\.usc\.edu/](https://adni.loni.usc.edu/)\. ADNI includes diverse modalities, including MRI and PET scans, demographic and routine clinical information, and cognitive assessment scores\. For our study, we included participants who had a baseline assessment \(denoted by ‘bl’\), a one\-year follow\-up \(‘m12’\), and at least one additional follow\-up visit within 10 years of baseline \(‘m120’\)\. A total of 1727 patients met these criteria\. Among them, 977 patients were male \(56\.57%\), and 750 were female \(43\.43%\)\. To build the model, we had selected the following features: demographic data \(age, gender \[Male/Female\], years of education, BMI\), hypertension status \(Yes/No\), diagnosis status \(cognitive normal \[CN\], mild cognitive impairment \[MCI\], Dementia \[AD\]\), APOE4 gene status \(non\-carrier, heterozygous, homozygous\) and cognitive scores \(CDR\-SB and MMSE\)\. We selected features that are routinely collected during standard clinical visits, requiring no neuroimaging or biomarker infrastructure\. Table[1](https://arxiv.org/html/2606.07798#A1.T1)provides the complete description of all input features, their types, encodings, and resulting dimensionality\. #### 2\.1\.2Data preprocessing The categorical features in our dataset were one\-hot encoded, and the continuous variables were scaled by their maximum value\. To handle missing values, we used forward filling to impute the values\. There are several techniques for handling missing values\. However, as noted in a study by Das et al\.Daset al\.\[[2025](https://arxiv.org/html/2606.07798#bib.bib41)\], the forward filling technique performs closely with other methods in terms of predictive accuracy, and is simple to implement and understand\. All features were imputed, except the target features \(CDR\-SB and MMSE scores\) and Age, which could be imputed by adding or subtracting the visit difference\. We did not impute the target variables \(CDR\-SB and MMSE\), as imputing the prediction targets would yield an incorrect estimate of model performance during evaluation\. Visit codes from months after baseline were converted to fractional years\. For example, baseline \(’bl’\) was encoded as 0\.0, ’m06’ as 0\.5, ’m12’ as 1\.0, and so on, with the maximum visit code of ’m120’ encoded as 10\.0\. The patient counts and descriptive statistics for annual and mid\-year visits are presented in Tables[1](https://arxiv.org/html/2606.07798#S2.T1)and[2](https://arxiv.org/html/2606.07798#S2.T2), respectively\. FeaturesTime Points012345678910Count172717271332851690371329250195133113Continuous Variables \(Mean ± SD\):BMI26\.77±5\.3526\.71±5\.3626\.44±5\.5826\.27±6\.0626\.1±7\.2926\.28±6\.3125\.86±6\.2325\.67±6\.7426\.12±6\.125\.43±6\.7325\.91±6\.8Education16\.0±2\.7616\.0±2\.7615\.99±2\.7916\.08±2\.7516\.12±2\.7116\.1±2\.8116\.02±2\.8115\.99±2\.8615\.97±2\.7416\.08±2\.7416\.06±2\.65Age73\.65±7\.2274\.64±7\.2275\.59±7\.2276\.38±7\.1176\.88±6\.878\.09±6\.7779\.46±6\.6280\.21±6\.4580\.9±6\.6381\.63±6\.4582\.19±6\.21CDR\-SB1\.61±1\.722\.13±2\.562\.46±3\.122\.23±2\.842\.01±3\.042\.47±3\.342\.3±3\.582\.45±3\.712\.32±3\.392\.4±3\.641\.81±2\.92MMSE27\.23±2\.6226\.47±3\.9226\.19±4\.5926\.59±4\.2626\.82±4\.2526\.16±4\.9126\.4±5\.026\.54±4\.7726\.48±4\.7526\.2±4\.827\.04±4\.16Categorical Variables \(Count\):DiagnosisCN48649642223326911914497794648MCI926809531414276161113100705346AD315422379204145917253463419APOE4Non9239237254883942212091561279074Hetero6266264772862461239777563532Homo17817813077502723171287HypertensionNo8838837004483612031801401137869Yes844844632403329168149110825544GenderFemale750750586362303162141112816251Male9779777464893872091881381147162 Table 1:Data summary at yearly visits \(0\-10\)\. CN = Cognitively Normal; MCI = Mild Cognitive Impairment; AD = Alzheimer’s Disease; CDR\-SB = Clinical Dementia Rating\-Sum of Boxes; MMSE = Mini\-Mental State Examination\.FeaturesTime Points1\.52\.53\.54\.55\.56\.57\.58\.59\.5Count327219247675844940Continuous Variables \(Mean ± SD\):BMI25\.78±4\.428\.7±5\.9127\.57±3\.8828\.29±4\.7127\.17±5\.4928\.17±4\.5426\.86±5\.8426\.85±3\.8826\.72±5\.76Education15\.77±2\.9915\.62±2\.6715\.44±2\.5116\.88±2\.2516\.68±2\.5516\.75±2\.516\.8±2\.7517\.22±2\.4317\.32±2\.46Age76\.16±7\.1476\.55±8\.2974\.24±8\.9177\.47±7\.0877\.32±7\.0178\.19±6\.6879\.29±6\.7179\.8±7\.1781\.36±6\.14CDR\-SB2\.59±1\.952\.24±3\.691\.94±3\.641\.98±2\.682\.03±3\.671\.72±3\.281\.36±2\.591\.57±2\.821\.16±1\.94MMSE25\.89±3\.5125\.33±4\.0226\.67±5\.5727\.58±2\.7327\.14±4\.3127\.39±3\.7627\.3±3\.8627\.69±2\.8727\.85±2\.52Categorical Variables \(Count\):DiagnosisCN108584434371622MCI22472132029372614AD9362312121074APOE4Non151125154848523328Hetero1367382324291510Homo4021153312HypertensionNo164103113946483319Yes163116133729361621GenderFemale1179563430361816Male210124184245483124 Table 2:Data summary at mid\-year visits \(1\.5\-9\.5\)\. CN = Cognitively Normal; MCI = Mild Cognitive Impairment; AD = Alzheimer’s Disease; CDR\-SB = Clinical Dementia Rating\-Sum of Boxes; MMSE = Mini\-Mental State Examination\. #### 2\.1\.3Experimental Design After preprocessing, we divided the dataset into two groups —D1D1andD2D2\.D1D1contained patients with annual visit data for visit codes 0\-7\. This dataset was split 70/10/20 for training, validation, and testing, respectively\. We limitedD1D1to visit 7 as patient counts drop substantially beyond that point, which would introduce excessive noise into training\. D2D2contained participants’ baseline, first follow\-up data \(visit codes 0 and 1\) alongside remaining semi\-annual visits: 1\.5, 2\.5, 3\.5, 4\.5, 5\.5, and 6\.5 for testing interpolation capabilities, and visits: 7\.5, 8, 8\.5, 9, 9\.5, 10 for extrapolation testing\.D2D2was used exclusively for testing, with no involvement in training or validation\. ### 2\.2Model Architecture #### 2\.2\.1Gated Recurrent Unit \(GRU\) Gated Recurrent Units \(GRUs\) are the gated variant of recurrent neural networks \(RNNs\) that can process longitudinal data and capture temporal dependenciesChoet al\.\[[2014](https://arxiv.org/html/2606.07798#bib.bib34)\]\. GRUs address the vanishing gradient problem of vanilla RNNs by introducing update and reset gates that control which information is retained and discarded across time steps\. Given a sequence of inputs\{x\(i\)\}i=1N\\\{x^\{\(i\)\}\\\}\_\{i=1\}^\{N\}, it produces hidden states\{h\(i\)\}i=1N\\\{h^\{\(i\)\}\\\}\_\{i=1\}^\{N\}which holds the temporal information of the sequence\. #### 2\.2\.2Variational Autoencoder \(VAE\) Autoencoders are a type of neural network architecture that learns a compressed representation of inputs, enabling it to reconstruct the input from that representation\. Variational Autoencoders \(VAEs\) extend it by incorporating probabilistic modeling through variational Bayesian inference, enabling the learning of structured latent representations while maintaining generative capabilitiesKingma and Welling \[[2022b](https://arxiv.org/html/2606.07798#bib.bib1)\]\. Rather than encoding inputs to a fixed point in latent space, a VAE encodes them to a distribution, parameterized by meanμ\\muand standard deviationσ\\sigma, from which latent samples are drawn\. This distributional encoding is what enables the model to produce confidence intervals around its predictions rather than deterministic point estimates, which is critical for clinical applications where overconfident predictions can mislead treatment decisions\. A brief discussion on VAE architecture is given in appendix[B](https://arxiv.org/html/2606.07798#A2) #### 2\.2\.3Neural Ordinary Differential Equation Neural Ordinary Differential Equations \(NODEs\) are a class of deep learning models that replace the traditional discrete sequence of hidden layers with a continuous\-depth framework governed by ordinary differential equationsChenet al\.\[[2019](https://arxiv.org/html/2606.07798#bib.bib4)\]\. In a discrete residual network, the evolution of the hidden state fromtttot\+1t\+1is given by ht\+1=ht\+f\(ht,θ\)h\_\{t\+1\}=h\_\{t\}\+f\(h\_\{t\},\\theta\)\(1\)Whereθ\\thetais the parameter of the functionff\. In Neural ODE, instead of stacking a fixed number of layers, we treat the hidden stateh\(t\)h\(t\)evolving under an ordinary differential equation given as dh\(t\)dt=f\(h\(t\),t;θ\)\\frac\{dh\(t\)\}\{dt\}=f\(h\(t\),t;\\theta\)\(2\) Whereffis a neural network parameterized byθ\\theta\. The Neural ODE solves the ODE over an interval\[t,t\+1\]\[t,t\+1\]to give the next state as shown in equation[3](https://arxiv.org/html/2606.07798#S2.E3)\. ht\+1=ht\+∫tt\+1f\(h\(τ\),τ;θ\)𝑑τh\_\{t\+1\}=h\_\{t\}\+\\int\_\{t\}^\{t\+1\}f\(h\(\\tau\),\\tau;\\theta\)d\\tau\(3\) In practice, we approximate this using an appropriate numerical ODE solver\. The notation is given in equation[4](https://arxiv.org/html/2606.07798#S2.E4) ht\+1=ODESolve\(ht,t,t\+1,θ\)h\_\{t\+1\}=ODESolve\(h\_\{t\},t,t\+1,\\theta\)\(4\) #### 2\.2\.4Final Architecture The complete GNOVA architecture is given in Figure[1](https://arxiv.org/html/2606.07798#S2.F1) Figure 1:GNOVA ArchitectureThe complete architecture consists of four blocks\. The details of each block are given below\. 1. 1\.Static Encoder Block:It consists of a multi\-layer perceptron \(MLP\) that nonlinearly transforms the 14\-dimensional baseline static feature vectorxstaticx\_\{\\text\{static\}\}into a fixed\-dimensional representationhstatich\_\{\\text\{static\}\}\(encoder\_hidden\_dim\), using ReLU activation\. 2. 2\.Sequence Encoder Block:It consists of a GRU that processes the longitudinal sequence of input cognitive scores\{xi\}i=1N\\\{x\_\{i\}\\\}\_\{i=1\}^\{N\}and produces hidden states\{hi\}i=1N\\\{h\_\{i\}\\\}\_\{i=1\}^\{N\}\. The final hidden statehNh\_\{\\text\{N\}\}is concatenated withhstatich\_\{\\text\{static\}\}to form the joint representationhfinalh\_\{\\text\{final\}\}of dimension 2\*encoder\_hidden\_dim, integrating both the patient’s temporal history of cognitive scores and their baseline clinical profile\. We explored several schemes for the GRU hidden states, including averaging all hidden states\{hi\}i=1N\\\{h\_\{i\}\\\}\_\{i=1\}^\{N\}and an attention\-weighted sum of them\. However, we found that the concatenation of the final state performed better, as shown in the ablation study \(section[3\.5](https://arxiv.org/html/2606.07798#S3.SS5)\)\. 3. 3\.Latent Dynamics block:It consists of an MLP \(latent\_dim\) that mapshfinalh\_\{\\text\{final\}\}to the mean and the variance of the latent distribution\. The reparameterization trick is applied to form the initial latent variablezinitz\_\{\\text\{init\}\}given by equation[5](https://arxiv.org/html/2606.07798#S2.E5)\. zinit=μ\+σ⊙ε,ε∼𝒩\(0,I\)z\_\{\\text\{init\}\}=\\mu\+\\sigma\\odot\\varepsilon,\\quad\\varepsilon\\sim\\mathcal\{N\}\(0,I\)\(5\) 4. 4\.Decoder block:In this block, the Neural ODE framework receiveszinitz\_\{\\text\{init\}\}as input and integrates the latent trajectory continuously from t=0 to all desired prediction time points as given by equation[6](https://arxiv.org/html/2606.07798#S2.E6)\. zt\+1=ODESolve\(zinit,0,\.\.,t\+1,θ\)z\_\{t\+1\}=ODESolve\(z\_\{init\},0,\.\.,t\+1,\\theta\)\(6\) The resulting latent variables\{zi\}i=1N\\\{z\_\{i\}\\\}\_\{i=1\}^\{N\}are passed through an MLP \(decoder\_hidden\_dim\) with Tanh activation to output the mean and variance of the predicted cognitive scores\{x^i\}i=1N\\\{\\hat\{x\}\_\{i\}\\\}\_\{i=1\}^\{N\}\. In all experiments, we setN=7N=7\. ### 2\.3Training and Validation #### 2\.3\.1Loss Function We employ the standard VAE loss function given in equation[11](https://arxiv.org/html/2606.07798#A2.E11)\. The function consists of reconstruction loss \(ℒrec\\mathcal\{L\}\_\{rec\}\), which is the mean squared error \(MSE\) of the actual inputs and predicted values\. It also has a regularization term \(ℒreg\\mathcal\{L\}\_\{reg\}\) to prevent the model from predicting infinite uncertaintyKingma and Welling \[[2022a](https://arxiv.org/html/2606.07798#bib.bib35)\]\. The total loss consisted of the reconstruction loss \(ℒrec\\mathcal\{L\}\_\{rec\}\), KL divergence loss \(ℒKL\\mathcal\{L\}\_\{KL\}\) ensuring the latent distribution remains close to the prior, and the regularizer loss \(ℒreg\\mathcal\{L\}\_\{reg\}\), given by equation[7](https://arxiv.org/html/2606.07798#S2.E7) ℒtotal=ℒrec\+βℒKL\+λℒreg\\mathcal\{L\}\_\{total\}=\\mathcal\{L\}\_\{rec\}\+\\beta\\mathcal\{L\}\_\{KL\}\+\\lambda\\mathcal\{L\}\_\{reg\}\(7\) Whereβ\\betaandλ\\lambdaare the hyperparameters used to control the weightage of the loss terms\. Expanding each term, we have the function given by ℒ=12\(\(xi−x^i\)2σ2\)\+β2\(1\+logσj2−σj2−μj2\)\+λ2\(logσ2\)\\mathcal\{L\}=\\frac\{1\}\{2\}\\left\(\\frac\{\(x\_\{i\}\-\\hat\{x\}\_\{i\}\)^\{2\}\}\{\\sigma^\{2\}\}\\right\)\+\\frac\{\\beta\}\{2\}\(1\+\\log\\sigma\_\{j\}^\{2\}\-\\sigma\_\{j\}^\{2\}\-\\mu\_\{j\}^\{2\}\)\+\\frac\{\\lambda\}\{2\}\(\\log\\sigma^\{2\}\)\(8\) Here,σ2\\sigma^\{2\}is the predictive variance from the decoder, andσj2\\sigma\_\{j\}^\{2\}is the posterior variance for latent dimensionjjfrom the encoder\. The reconstruction loss uses mean squared error\. #### 2\.3\.2Experimental setup and hyperparameters We implemented our model in PyTorch\. For hyperparameter optimization, we used the Optuna frameworkAkibaet al\.\[[2019](https://arxiv.org/html/2606.07798#bib.bib38)\]with 100 trials and selected parameters that minimized the validation loss\. Table[3](https://arxiv.org/html/2606.07798#S2.T3)shows the search ranges and optimal values for each cognitive score model\. ParameterSearch RangeCDR\-SBMMSEencoder\_hidden\_dim\[16, 256\]32154latent\_dim\[16, 256\]5714decoder\_hidden\_dim\[16, 256\]6760learning\_rate\[0\.0001, 0\.1\]0\.00510\.0067β\\beta\[0, 1\]0\.6910\.202λ\\lambda\[0, 1\]0\.2220\.164Table 3:Hyperparameter search space and optimal values for CDR\-SB and MMSE modelsModels were trained for 500 epochs with a batch size of 16 using the Adam optimizer\. We performed 5\-fold cross\-validation to ensure robust performance estimates\. The ODE solver used default tolerance settings, which provided a good balance between accuracy and computational efficiency\. #### 2\.3\.3Training and Inference Mechanism The ADNI datasetMuelleret al\.\[[2005](https://arxiv.org/html/2606.07798#bib.bib2)\]is a prospective, well\-organized cohort study in which subjects undergo regular study visits\. Many subjects, however, might have dropped out due to a variety of reasons, and it gets reflected in the table[1](https://arxiv.org/html/2606.07798#S2.T1)and[2](https://arxiv.org/html/2606.07798#S2.T2)where the sample sizes become smaller with time\. However, this missingness cannot be used for predictions in the experiment\. Predicting these values cannot be used to assess the model’s goodness, as we do not have ground\-truth values\. Hence, we use only the available time points to predict, mimicking a real\-life scenario of irregular visits\. All participants in the dataset who met the criteria discussed in the Data section above had baseline \(visit 0\) and one\-year follow\-up \(visit 1\) visits\. We used only the input from those two time points for training\. The other visits, except 0 and 1, are used for training as targets\. During testing, however, we give any number of inputs at any desired time point\. For example, we train the model with input time points 0,1, but we test the model with input time points 2,5, as shown in the Result sections later\. This depicts a real\-life scenario in which a clinician might need to predict a patient’s disease trajectory at random time points\. During training and inference, we follow the below masking strategy - 1\.Training:For each patient, we created a binary mask where 1 indicates the presence of data, and 0 indicates missing visits\. The missing values were filled with zeros and were not considered during the calculation of the loss\. This ensured that the model learned only from actual observations and maintained a consistent input dimension\. The static encoder block was fed with the features at visit 0\. - 2\.Inference:During inference, we passed the values at the input time point \(e\.g, 0 and 1, or 2 and 5, etc\.\) and set the other values to 0\. The masking was applied after we generated the outputs\. The values we wanted to predict were masked to 1, and the input time points were set to 0\. The static encoder block was fed the features of the earliest time point\. \(0 and 2 respectively as per the earlier example\)\. ## 3Results ### 3\.1Results on DatasetD1D1\- Forward Prediction & Retrospective Imputation We evaluated the model performance using 5\-fold cross\-validation on the test set \(20% ofD1D1\)\. To assess the model’s flexibility with respect to input configuration, we tested the trained model across all possible pairs of input time points\. Here, we present two scenarios: the forward prediction using the earliest available visits and retrospective imputation using the latest available visits\. The results for all remaining input combinations are in[C](https://arxiv.org/html/2606.07798#A3)\. The two scenarios are \- - 1\.Forward prediction: The model predicts cognitive scores at years 2 through 7 from baseline and first follow\-up observations \(Visit 0 and 1\)\. This represents the clinical scenario in which the model forecasts from a minimal early history\. - 2\.Retrospective imputation: The model reconstructs earlier time points \(visits 0 through 5\) from late\-trajectory observations \(visits 6 & 7\)\. This is relevant in scenarios where a patient visits late in their disease course, and the clinician needs to reconstruct the earlier history\. The results of our experiment are given in the Table[4](https://arxiv.org/html/2606.07798#S3.T4)for CDR\-SB and MMSE\. Input \- 0,1Target Time PointNo of patientsCDR\-SBMMSEMAERMSEMAERMSE2266\.4±\\pm2\.60\.8775±\\pm0\.05821\.4104±\\pm0\.09771\.7553±\\pm0\.09262\.4917±\\pm0\.24793170\.2±\\pm8\.51\.0519±\\pm0\.14901\.6605±\\pm0\.23401\.8831±\\pm0\.07942\.8474±\\pm0\.15004138\.0±\\pm10\.11\.1476±\\pm0\.15901\.9041±\\pm0\.38431\.9974±\\pm0\.17113\.0687±\\pm0\.3372574\.2±\\pm11\.31\.5342±\\pm0\.19072\.4050±\\pm0\.34652\.3953±\\pm0\.26663\.7750±\\pm0\.5666665\.8±\\pm3\.91\.5182±\\pm0\.27262\.4302±\\pm0\.38842\.3411±\\pm0\.46623\.6658±\\pm0\.9266750\.0±\\pm5\.21\.7380±\\pm0\.18102\.6138±\\pm0\.21142\.3993±\\pm0\.29383\.5895±\\pm0\.7028Input \- 6,7Target Time PointNo of patientsCDR\-SBMMSEMAERMSEMAERMSE0345\.4±\\pm0\.50\.6541±\\pm0\.02820\.9928±\\pm0\.06511\.4869±\\pm0\.01911\.8779±\\pm0\.02941345\.4±\\pm0\.51\.0598±\\pm0\.03971\.6429±\\pm0\.04521\.9658±\\pm0\.09662\.7555±\\pm0\.16592266\.4±\\pm2\.61\.3176±\\pm0\.12992\.0253±\\pm0\.19432\.2156±\\pm0\.18693\.2138±\\pm0\.42713170\.2±\\pm8\.51\.4281±\\pm0\.14862\.2117±\\pm0\.30292\.3075±\\pm0\.07063\.5007±\\pm0\.23724138\.0±\\pm10\.11\.3536±\\pm0\.13172\.2926±\\pm0\.32552\.2310±\\pm0\.15233\.4543±\\pm0\.4783574\.2±\\pm11\.31\.4183±\\pm0\.15772\.2640±\\pm0\.43112\.2737±\\pm0\.22983\.5549±\\pm0\.5834Table 4:Results for CDR\-SB and MMSE predictions with different input configurationsFor forward prediction, the model attained a competitive MAE of 0\.8775±\\pm0\.0582 at year 2, progressively rising to 1\.7380±\\pm0\.1810 at year 7 for CDR\-SB\. MMSE predictions exhibited a similar pattern, with MAE of 1\.7553±\\pm0\.0926 at year 2 increasing to 2\.3993±\\pm0\.2938 at year 7\. For retrospective imputation, the model achieved its best performance in baseline reconstruction, with MAEs of 0\.6541±\\pm0\.0282 and 1\.4869±\\pm0\.0191 for CDR\-SB and MMSE, respectively\. Interestingly, retrospective predictions were not consistently better than forward predictions at time points close to the input\. For example, at t=4, which is closer to the input window of t=6,7 than to t=0,1, retrospective MAE was 1\.3536±\\pm0\.1317 compared to forward MAE of 1\.1476±\\pm0\.1590 for CDR\-SB\. A similar pattern was observed in MMSE predictions\. ### 3\.2Results for DatasetD2D2\- Interpolation & Extrapolation To evaluate the model’s capacity to generalize beyond its training conditions, we assessed the trainedD1D1model on theD2D2dataset, which included semi\-annual visits excluded from the training phase\. The results are shown in Table[5](https://arxiv.org/html/2606.07798#S3.T5)\. The model was trained on annual visits from t=0 to t=7 \(utilizingD1D1, with t=0 and 1 as inputs and t=2 to 7 as targets\) and evaluated onD2D2, where the interpolation range comprised t=1\.5, 2\.5, 3\.5, 4\.5, 5\.5, and 6\.5, while the extrapolation range included t=7\.5, 8, 8\.5, 9, 9\.5, and 10\. Input \- 0,1Target Time PointNo of patientsCDR\-SBMMSEMAERMSEMAERMSE1\.53270\.8213±\\pm0\.01961\.2072±\\pm0\.03602\.4645±\\pm0\.12262\.9777±\\pm0\.12242\.5210\.8839±\\pm0\.05351\.4152±\\pm0\.13411\.5477±\\pm0\.14751\.8872±\\pm0\.19263\.591\.2427±\\pm0\.10881\.8039±\\pm0\.23022\.4706±\\pm0\.41473\.9910±\\pm0\.37894\.5240\.7924±\\pm0\.05341\.2024±\\pm0\.08521\.5550±\\pm0\.28491\.9713±\\pm0\.30495\.5761\.1848±\\pm0\.03051\.9289±\\pm0\.09282\.0996±\\pm0\.10203\.1698±\\pm0\.11006\.5751\.5545±\\pm0\.12572\.3691±\\pm0\.23412\.1250±\\pm0\.16503\.0381±\\pm0\.15697\.5841\.6426±\\pm0\.15842\.6580±\\pm0\.31532\.3049±\\pm0\.08923\.3058±\\pm0\.166081951\.7151±\\pm0\.07972\.5508±\\pm0\.08572\.4463±\\pm0\.19273\.9443±\\pm0\.42088\.5492\.0075±\\pm0\.32862\.9461±\\pm0\.55482\.4025±\\pm0\.16563\.2603±\\pm0\.244291332\.2122±\\pm0\.10593\.2339±\\pm0\.10073\.1661±\\pm0\.29324\.9525±\\pm0\.64279\.5401\.8383±\\pm0\.24432\.4084±\\pm0\.48612\.5838±\\pm0\.82083\.4534±\\pm0\.7474101132\.1530±\\pm0\.20512\.9800±\\pm0\.24742\.8090±\\pm0\.55314\.6702±\\pm0\.7766Table 5:Interpolation and extrapolation results for CDR\-SB and MMSEThe interpolation performance was predominantly robust, especially at closer time periods\. The exception was the MMSE’s performance at t=1\.5, which was not as good as CDR\-SB’s\. Interestingly, we noticed a correlation between sample size and forecast stability\. At t=3\.5, with just 9 patients available, the CDR\-SB MAE variance was±\\pm0\.1088, and for MMSE, it was±\\pm0\.4147, both exhibiting much greater variability than at prior interpolation time points\. This instability, most likely, arises from the limited sample size\. In the case of extrapolation, too, our model performed well at near\-time points, and its predictive performance declined steadily as the distance from the training window increased\. However, it is expected behavior for any model operating beyond its observed range\. CDR\-SB MAE rose from 1\.6426±\\pm0\.1584 at t=7\.5 to 2\.1530±\\pm0\.2051 at t=10\. However, it is essential to recognize that the variance in CDR\-SB scores is considerable at these time points\. The standard deviation of actual CDR\-SB scores at t=9 and t=10, as indicated in Table 1, is approximately 3\.6 and 2\.9, respectively\. Consequently, the model’s mean absolute error of approximately 2\.1–2\.2 indicates a prediction error within the data’s inherent variability\. Moreover, the number of patients at these remote time points is much lower \(n=113 at t=10\), which makes these estimates less reliable statistically\. MMSE extrapolation indicated a significant deterioration, with MAE attaining 3\.1661±\\pm0\.2932 at t=9, implying that MMSE trajectories are more challenging to extrapolate over extended periods compared to CDR\-SB trajectories\. ### 3\.3Observations within confidence intervals To evaluate the reliability of the model’s uncertainty estimates, we computed the percentage of true values falling within the predicted 95% confidence intervals at each time point\. Results are shown in Table[6](https://arxiv.org/html/2606.07798#S3.T6)\. Input \- 0,1Target Time PointNo of patientsNo of observations in CI \(in %\)CDR\-SBMMSE1\.532794\.50±\\pm0\.8685\.26±\\pm3\.472266\.4±\\pm2\.692\.41±\\pm0\.5988\.82±\\pm1\.742\.52190\.48±\\pm3\.0191\.43±\\pm4\.673170\.2±\\pm8\.590\.61±\\pm2\.9789\.87±\\pm2\.523\.5988\.89±\\pm0\.0086\.67±\\pm4\.444138\.0±\\pm10\.189\.35±\\pm1\.4590\.39±\\pm1\.324\.52495\.83±\\pm2\.6495\.83±\\pm2\.64574\.2±\\pm11\.386\.13±\\pm4\.1685\.29±\\pm3\.475\.57687\.37±\\pm1\.7885\.00±\\pm2\.14665\.8±\\pm3\.984\.69±\\pm2\.1582\.14±\\pm8\.596\.57586\.13±\\pm1\.3685\.07±\\pm2\.72750\.0±\\pm5\.281\.48±\\pm4\.1277\.20±\\pm6\.637\.58487\.38±\\pm3\.3379\.52±\\pm2\.95819585\.54±\\pm2\.4678\.26±\\pm5\.498\.54984\.49±\\pm3\.7975\.92±\\pm4\.36913380\.60±\\pm3\.6061\.50±\\pm8\.649\.54091\.00±\\pm4\.6466\.50±\\pm16\.781011382\.30±\\pm3\.6756\.28±\\pm17\.19Table 6:Percentage of observations within the confidence interval \(CI\) for CDR\-SB and MMSE predictionsThe coverage of CDR\-SB was robust and consistent\. Within the interpolation range \(t=1\.5 to t=6\.5\), coverage varied from 84\.69% to 95\.83%, with an average of about 90% across folds\. During the extrapolation interval \(t=7\.5 to t=10\), coverage varied from 80\.60% to 91\.00%, with an average of roughly 85%\. The confidence intervals for CDR\-SB are consistently well\-calibrated in both interpolation and extrapolation, achieving over 80% coverage at all assessed time periods\. However, for MMSE predictions, the results were different\. Within the interpolation range, MMSE coverage was comparable to CDR\-SB, varying from 82\.14% to 95\.83% and averaging about 88%\. However, coverage dropped significantly beyond t=9, decreasing to 61\.50% at t=9 and 56\.28% at t=10\. The model’s uncertainty estimates are well\-calibrated for MMSE within and near the training window; they become increasingly underestimated at long extrapolation horizons\. ### 3\.4Effect of Input Configuration and Sequence Length We also conducted a comprehensive analysis to understand how the input time point influences prediction performance\. First, we tested all single and paired time points as inputs to identify which observations were most informative for trajectory reconstruction\. Second, we examined how prediction accuracy changed as we incrementally added more historical data points\. Tables[1](https://arxiv.org/html/2606.07798#A3.T1)and[2](https://arxiv.org/html/2606.07798#A3.T2)show the results of the experiment for CDR\-SB predictions and[3](https://arxiv.org/html/2606.07798#A3.T3)and[4](https://arxiv.org/html/2606.07798#A3.T4)for MMSE predictions\. As shown in Table[1](https://arxiv.org/html/2606.07798#A3.T1)and[3](https://arxiv.org/html/2606.07798#A3.T3), prediction accuracy improved consistently as the number of input time points increased\. Adding a third observation beyond visits 0 and 1 improved MAE across all subsequent time points by approximately 30% on average for CDR\-SB, demonstrating the model’s effective use of additional historical context\. The improvement was most visible at later time points, where the benefit of more historical information is greatest\. Table[2](https://arxiv.org/html/2606.07798#A3.T2)and[4](https://arxiv.org/html/2606.07798#A3.T4)reveals a consistent pattern in single\-input\-pair performance: the best MAE for any target time point was generally achieved when the input time point immediately preceded it\. For example, the best prediction at t=4 used the input at t=3 \(MAE: 0\.92\), while at t=3, the input at t=2 \(MAE: 0\.88\) for CDR\-SB\. Tables[5](https://arxiv.org/html/2606.07798#A3.T5)and[6](https://arxiv.org/html/2606.07798#A3.T6)show the results for two input time points\. ### 3\.5Ablation Study We conducted a systematic ablation study to analyze two aspects: encoder aggregation strategy and static feature contribution\. Results are reported in tables[1](https://arxiv.org/html/2606.07798#A4.T1)and[2](https://arxiv.org/html/2606.07798#A4.T2)for CDR\-SB and MMSE predictions, respectively\. For the static features aggregation, we tried three approaches: using the final hidden state \(current model\), the average of all hidden states, and an attention\-weighted sum\. We observed that no single approach performed best at all time points\. However, on average, the final hidden state achieved better MAE for both CDR\-SB \(1\.4464\) and MMSE \(2\.2653\), marginally outperforming the average hidden state approach \(CDR\-SB: 1\.4491, MMSE: 2\.3957\) and the attention\-based approach \(CDR\-SB: 1\.4591, MMSE: 2\.4309\)\. The contribution of each static covariate was assessed by systematically removing it and observing the change in MAE\. For CDR\-SB predictions, age \(MAE increase to 1\.4891\) and APOE4 status \(1\.4772\) had the strongest influence, while MMSE as a covariate \(1\.4673\) and BMI \(1\.4605\) had moderate influence\. Interestingly, removing hypertension \(1\.4148\) and gender \(1\.4148\) slightly improved CDR\-SB predictions, suggesting these features may introduce noise rather than signal for CDR\-SB trajectory modeling\. However, this is not conclusive and requires further analysis to establish\. For MMSE predictions, age \(2\.3982\), APOE4 status \(2\.3880\), diagnosis stage \(2\.3411\), CDR\-SB \(2\.3212\), and BMI \(2\.3186\) all had a strong influence\. Surprisingly, removing years of education slightly improved MMSE predictions \(2\.2460 vs 2\.2653\)\. ### 3\.6Case studies We present three case studies that represent clinically distinct progression characteristics, and we evaluate our model’s predictions for those cases using three different input configurations: early visits \(0,1\), mid\-trajectory visits \(2,5\), and late visits \(6,7\)\. Only patients with complete data across all seven annual time points were considered\. The three patients chosen exhibit three distinct progression patterns frequently observed in Alzheimer’s Disease: a patient with a sudden, nonlinear increase in CDR\-SB \(P1P1\), a patient with stable, near\-zero progression \(P2P2\), and a patient with consistently monotonically rising scores \(P3P3\)\. Figures[2](https://arxiv.org/html/2606.07798#S3.F2),[3](https://arxiv.org/html/2606.07798#S3.F3), and[4](https://arxiv.org/html/2606.07798#S3.F4)illustrate the confidence interval, predicted score, and true score for each patient and input configuration\. The model captured the progression trend well for patientsP2P2\(stable\) andP3P3\(monotonically increasing\) at all the input configurations\. Confidence intervals were appropriately narrow around the input time points and widened with increasing prediction distance, reflecting well\-calibrated uncertainty\. Figure 2:Trajectory of the patients with Input time points \- 0,1Figure 3:Trajectory of the patients with Input time points \- 2,5Figure 4:Trajectory of the patients with Input time points \- 6,7ForP1P1\(abrupt jumps\), the model captured the sudden score jump from t=5 to t=6 successfully when early visits \(0,1\) were used as inputs\. This is possibly because the model learned this sharp transition pattern during forward training\. However, when mid\-trajectory visits \(2,5\) were used, the model failed to anticipate the jump at t=6, producing a smoother trajectory that underestimates the severity of progression\. For this input, it could capture the previous time point’s trajectory, but failed to capture the future trajectory\. In terms of point\-value estimates, it was underestimated at all the time points\. However, for the late inputs \(6,7\), the model could not reconstruct the preceding fall from t=6 to t=5 in terms of point values, but it well captured the progression trend\. Nevertheless, this inability to capture the abrupt changes stems from the limitation of Neural ODE\-based models: the continuous latent trajectory enforced by the ODE solver favors smooth dynamics and is inherently less suited to abrupt nonlinear transitionsChenet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib49)\]\. At t=5 and t=7 specifically,P1P1shows notable discrepancy between true and predicted values across input configurations, and we acknowledge these as genuine prediction failures rather than artifacts of the evaluation\. Clinically, this suggests the model may underperform among rapid progressors whose trajectories involve sudden deterioration\. Extending the architecture to accommodate sharper trajectory dynamics is an important direction for future work\. ### 3\.7Comparison with previous works We compared our models’ results with those of previous work at different time points using relevant metrics\. Tables[7](https://arxiv.org/html/2606.07798#S3.T7)and[8](https://arxiv.org/html/2606.07798#S3.T8)show the comparison for CDR\-SB and MMSE scores\. Our model demonstrates competitive performance across both CDR\-SB and MMSE predictions and has the key advantage of using readily available clinical data rather than expensive neuroimaging and fluid modalities\. For CDR\-SB predictions, our GNOVA model achieves strong results across multiple time points\. Using only demographic and clinical data, our model outperforms previous methods at M12 \(RMSE: 1\.329, MAE: 0\.849\) and M24 \(RMSE: 1\.720, MAE: 1\.133\) that incorporated complex neuroimaging data\. Even at M18 and M36, where Lei et al\.Leiet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib8)\]report slightly better performance, our results remain competitive\. Notably, their approach required MRI data and was tested on substantially smaller cohorts \(M18: N=282; M36: N=50\) than our larger testing set \(M18: N=327; M36: N=170\)\. For MMSE predictions, our model demonstrates reasonable performance across multiple time points\. In terms of MAE, our model achieves strong results at several key time points\. At M12, we obtained an MAE of 1\.864, closely competitive with Yuan et al\.Yuanet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib14)\]\(MAE: 1\.69\)\. Our performance remains consistent at M24 \(MAE: 1\.755\) and M36 \(MAE: 1\.883\), comparing favorably with Lei et al\.Leiet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib8)\]\. Our model’s predictions at M18 had a slightly higher error\. However, in our framework, prediction at M18 \(t=1\.5\) is an interpolation task, while other methods specifically trained their models for M18 prediction\. The RMSE results further validate our model’s effectiveness\. We achieve competitive performance through M36, with minimal deviation from best\-reported results: M12 \(Our: 2\.56, Morar et al\.Moraret al\.\[[2023](https://arxiv.org/html/2606.07798#bib.bib7)\]: 2\.17\), M18 \(Our: 2\.98, Morar et al\.Moraret al\.\[[2023](https://arxiv.org/html/2606.07798#bib.bib7)\]: 2\.18\), M24 \(Our: 2\.49, Tabarestani et al\.Tabarestaniet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib24)\]: 2\.38\), and M36 \(Our: 2\.84, Tabarestani et al\.Tabarestaniet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib24)\]: 2\.28\)\. Our model achieves the best performance at M30\. Post M36, RMSE increases moderately\. Methods achieving lower errors post M36, such as Jung et al\.Junget al\.\[[2021](https://arxiv.org/html/2606.07798#bib.bib16)\], utilized less than 50% cohorts at each time point\. Notably, all the works used expensive imaging modalities \(PET, MRI, CSF\) limiting the practical applicability in a low\-resource setting\. CDR\-SB \(RMSE\)Proposed WorksMethodModalitiesM12M24Liu et al\.Liuet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib6)\]wiseDNNMRI, CS2\.0082\.334Our Model \- Input \(0,1\)GNOVADemo, CS–1\.349Our model \- Input 0GNOVADemo, CS1\.3291\.720 CDRSB \(MAE\)Proposed WorksMethodModalitiesM12M18M24M36Lei et al\.Leiet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib18)\]SVRMRI, CS0\.9650\.8611\.3960\.816Lei et al\.Leiet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib8)\]IndRNNMRI, CS–0\.691\.010\.72Devanarayan et al\.Devanarayanet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib11)\]GBMRI, Demo, CS0\.940\.991\.151\.35–Our Model \- Input \(0,1\)GNOVADemo, CS–0\.8210\.8781\.052Our model \- Input 0GNOVADemo, CS0\.849–1\.1331\.336 Table 7:Comparison of CDR\-SB predictions with previous works at time points\. CDR\-SB: Clinical Dementia Rating Sum of Boxes; RMSE: Root Mean Square Error; MAE: Mean Absolute Error; MRI: Magnetic Resonance Imaging; CS: Cognitive Scores; Demo: Demographics; SVR: Support Vector Regression; IndRNN: Independently Recurrent Neural Network; GB: Gradient Boosting; GNOVA: Gated Recurrent Unit \- NOVA architecture; M12/M18/M24/M36: 12/18/24/36 months from baseline\.MMSE \(MAE\)Proposed WorksMethodModalitiesM12M18M24M36Lei et al\.Leiet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib18)\]SVRMRI, CS1\.8011\.7771\.841\.756Lei et al\.Leiet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib8)\]IndRNNMRI, CS–1\.741\.921\.56Yuan et al\.Yuanet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib14)\]MFSE\-DRNsMRI, CS, Demo, Others1\.69–1\.862\.03Our Model \- Input 0,1GNOVADemo,CS–2\.4651\.7551\.883Our model \- Input 0GNOVADemo,CS1\.864–2\.1242\.191 MMSE \(RMSE\)PaperMethodModalitiesM12M18M24M30M36M42M48M54M60M72M84M96M108Liu et al\.Liuet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib6)\]wiseDNNMRI, CS3\.128–3\.408––––––––––Jung et al\.Junget al\.\[[2021](https://arxiv.org/html/2606.07798#bib.bib16)\]Deep\-RNNVolume, CS, Demo2\.409–2\.483–2\.411–2\.64–2\.7932\.3442\.3032\.5462\.044Morar et al\.Moraret al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib23)\]FCNNVolume, CSF, PET, CS, Demo2\.522\.762\.7132\.873\.113\.053\.643\.23––––Yuan et al\.Yuanet al\.\[[2024](https://arxiv.org/html/2606.07798#bib.bib14)\]MFSE\-DRNsMRI, CS, Demo, Others2\.48–2\.67–3\.02––––––––Tabarestani et al\.Tabarestaniet al\.\[[2020](https://arxiv.org/html/2606.07798#bib.bib24)\]GB\-MTLVolume, PET, CS, Demo, CSF2\.24–2\.38–2\.28–2\.19––––––Morar et al\.Moraret al\.\[[2023](https://arxiv.org/html/2606.07798#bib.bib7)\]ST\-LSTMVolume, CSF, PET, CS, Demo2\.172\.182\.612\.522\.712\.673\.173\.012\.9––––Our Model \- Input 0,1GNOVADemo,CS–2\.982\.491\.882\.843\.993\.071\.973\.773\.673\.593\.944\.95Our model \- Input 0GNOVADemo,CS2\.56–3\.1–3\.28–3\.47–4\.054\.01––– Table 8:Comparison of MMSE predictions with previous works at time points\. MMSE: Mini\-Mental State Examination; MAE: Mean Absolute Error; RMSE: Root Mean Square Error; MRI: Magnetic Resonance Imaging; CS: Cognitive Scores; Demo: Demographics; sMRI: structural MRI; CSF: Cerebrospinal Fluid \(Aβ\\beta,p\-tau,t\-tau\); PET: Positron Emission Tomography \(FDG, PIB, AV45\); Volume \- ventricles, hippocampus, fusiform gyrus, middle temporal gyrus, entorhinal cortex, and whole\-brain; SVR: Support Vector Regression; IndRNN: Independently Recurrent Neural Network; MFSE\-DRN: Multi\-Feature Squeeze\-and\-Excitation Deep Residual Network; FCNN: Fully Connected Neural Network; GB\-MTL: Gradient Boosting Multi\-Task Learning; ST\-LSTM: Single Task Long Short\-Term Memory; MXX:XXmonths from baseline ## 4Discussion Our model achieved mean absolute errors of 1\.35 and 2\.28 for CDR\-SB and MMSE, respectively, over a ten\-year trajectory, without requiring any neuroimaging or biomarker modalities\. A change of 1\.0–2\.0 points in CDR\-SB is usually considered clinically significant in patients with mild\-to\-moderate Alzheimer’s disease\. Such a change indicates a noticeable shift in cognitive or functional abilities and often leads clinicians to reassess the patient\. In this context, the model’s average error \(MAE\) of 1\.35 for CDR\-SB lies within a clinically meaningful range\. This is especially relevant for early predictions, where the errors are smaller and treatment decisions are more critical\. In comparison, the MMSE score ranges from 0 to 30 but has known ceiling and floor effects due to different factors such as educationFranco\-Marinaet al\.\[[2010](https://arxiv.org/html/2606.07798#bib.bib43)\]\. Patients at early or late stages of the disease tend to have scores clustered near the maximum or minimum values, which reduces the range of variation\. As a result, an MAE of 2\.28 reflects not only model error but also the inherent difficulty of predicting changes within this limited scale\. At later time points, the predictions are more accurate when interpreted as indicative of trajectory direction rather than exact score values\. The associated confidence intervals provide more useful information about the expected range of outcomes\. The bidirectional prediction capability addresses a practical clinical need that is rarely addressed in existing literature\. When a patient arrives with only a few inconsistent historical observations, a clinician must simultaneously reason about the patient’s past and future\. The majority of current methodologies treat forecasting and imputation as distinct tasks, necessitating the development a unified framework for each direction\. Our architecture encompasses both within a singular, integrated framework\. The model’s retrospective capability to derive clinically plausible estimates is demonstrated by its ability to reconstruct baseline scores from late encounters, with an MAE of 0\.6541 for CDR\-SB baseline reconstruction\. An intriguing finding was that forward predictions were occasionally more precise than retrospective ones at proximal time points\. This may be attributed to the fact that forward neurodegeneration patterns are more consistently represented in the training data than reverse ones, given the natural directionality of disease progression\. The model’s probabilistic outputs provide a significant benefit compared to deterministic methods\. In healthcare, a point prediction without an associated uncertainty estimate imposes the whole interpretative responsibility on the physician, to rely or not on the model’s predictions\. Well\-calibrated confidence intervals effectively identify predictions characterized by significant model uncertainty\. In our experiments, CDR\-SB uncertainty estimates were consistently well\-calibrated during the entire assessment period, achieving over 80% coverage of actual targets within the projected 95% confidence intervals at all time periods, including the extrapolation range\. However, as we see in Table[6](https://arxiv.org/html/2606.07798#S3.T6), the MMSE coverage at extrapolation did not perform well\. The MMSE uncertainty estimates significantly deteriorated after t=9, with coverage decreasing to 61\.50% at t=9 and 56\.28% at t=10\. This is a meaningful limitation, and in clinical use, the model’s predictions at t=9 and t=10 for MMSE should be interpreted with awareness that the reported confidence intervals likely underestimate true uncertainty\. Improving long\-range uncertainty calibration for MMSE, potentially through recalibration techniques or ensemble approaches, is an important direction for future work\. The feature ablation study revealed few findings that are consistent with the established literature as well as some unexpected results\. The strong influence of age and APOE4 status on both CDR\-SB and MMSE predictions aligns with their well\-documented roles as the strongest risk and progression factors in ADSandoet al\.\[[2008](https://arxiv.org/html/2606.07798#bib.bib44)\], Roses \[[2006](https://arxiv.org/html/2606.07798#bib.bib45)\], Kimet al\.\[[2009](https://arxiv.org/html/2606.07798#bib.bib46)\]\. The moderate influence of BMI is consistent with emerging evidence linking metabolic health to cognitive trajectory in ADChoet al\.\[[2022](https://arxiv.org/html/2606.07798#bib.bib47)\]\. However, two unexpected findings necessitate further research\. First, removing years of education marginally improved MMSE predictions, which is counterintuitive as many research has establish a relationship between years of education and MMSEXieet al\.\[[2016](https://arxiv.org/html/2606.07798#bib.bib48)\]\. Second, removing hypertension and gender information slightly improved CDR\-SB predictions\. This is possibly because these features contribute more strongly to disease risk than to progression rate, and adds little predictive information beyond what is already provided by stronger covariates such as age and APOE4\. In the case studies, we noticed that the model performed well in capturing the trend of the progression\. However, for retrospective imputation, it still estimate the point values correctly\. This is expected, as the model is trained on forward prediction\. Future work would focus on methods to improve the point\-estimates\. Also, the model shows a tendency to slightly overestimate cognitive scores\. From a clinical perspective, this is preferable to underestimation\. Overestimation is more likely to lead to additional checks and earlier intervention, whereas underestimation could delay necessary action\. ## 5Limitations We acknowledge several limitations of our model\. Firstly, all experiments were conducted on the ADNI dataset, which, despite its size and longitudinal depth, is a North American prospective cohort with specific inclusion criteria that may not represent the full diversity of AD patients encountered in routine clinical practice\. External validation on independent datasets, particularly those from clinical settings with genuinely irregular visit patterns, is necessary before any deployment and remains the highest priority for future work\. Secondly, the static treatment of many covariates, such as diagnosis stage, hypertension status, age, etc\., is a deliberate simplification, and it is likely to limit prediction accuracy at longer horizons, where the baseline representation becomes increasingly outdated\. In future work, we plan to introduce a dynamical block to encode such features\. Thirdly, the continuous latent trajectory enforced by the Neural ODE solver inherently favors smooth dynamics, making the model less suited to patients who exhibit abrupt transitions in cognitive scores, as illustrated by the P1 case study results\. In future work, we can explore a second\-order differential equation\-based model of progression, taking into account where we can model the velocity and acceleration \(trajectory and drifts\) of the disease\. Fourth, a direct comparison of different architectural baselines, such as GRU\-ODE without VAE and ODE\-RNN was not included in this study, primarily because these architectures are very slow\. While the internal ablation study and published literature comparison provide partial evidence for the proposed design choices, a controlled component\-wise ablation would more rigorously establish the individual contribution of each architectural element and is planned for future work\. Finally, the model currently excludes neuroimaging and biomarker data by design\. However, in practical situations, a certain modality may be present\. This will limit the current architecture’s ability to incorporate additional information\. In the future, we plan to extend the framework to incorporate available biomarkers, while gracefully working with routine clinical data alone when they are absent, thereby broadening the model’s utility across the full spectrum of resource availability in clinical settings\. ## 6Conclusions The proposed GNOVA framework demonstrates that disease trajectory modeling in Alzheimer’s disease can be achieved using minimal routine clinical data as well, without any neuroimaging or biomarker infrastructure\. The model integrates bidirectional prediction, continuous interpolation and extrapolation, and calibrated uncertainty quantification into a single framework, providing capabilities that are not available simultaneously in any current methodology\. Its intentional reliance on low\-cost, readily available data makes it especially appropriate for resource\-limited healthcare environments that need decision\-support tools\. External validation on independent clinical datasets, comprehensive architectural ablation, and multimodal extensions remain important directions of future work\. We hope this work encourages further research into practical, accessible tools for personalized dementia care\. ## CRediT authorship contribution statement Ratnadeep Das: Writing – original draft, Writing – review & editing, Methodology, Data curation, Software, Visualization, Conceptualization, Formal analysis\.Atri Chatterjee: Writing – review & editing, Supervision, Conceptualization, Formal analysis\.Sitikantha Roy: Writing – review & editing, Supervision, Conceptualization, Formal analysis, Project administration\. ## Acknowledgments We acknowledge the subjects who participated in the Alzheimer’s Disease Neuroimaging Initiative \(ADNI\) and the team who made this work possible\. The data collection and sharing were funded by the ADNI \(National Institutes of Health Grant U01 AG024904\) and DOD ADNI \(Department of Defense award number W81XWH\-12\-2\-0012\)\. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc\.; Biogen; Bristol\-Myers Squibb Company; CereSpir, Inc\.; Cogstate; Eisai Inc\.; Elan Pharmaceuticals, Inc\.; Eli Lilly and Company; EuroImmun; F\. Hoffmann\-La Roche Ltd and its affiliated company Genentech, Inc\.; Fujirebio; GE Healthcare; IXICO Ltd\.; Janssen Alzheimer Immunotherapy Research & Development, LLC\.; Johnson & Johnson Pharmaceutical Research & Development LLC\.; Lumosity; Lundbeck; Merck & Co\., Inc\.; Meso Scale Diagnostics, LLC\.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc\.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics\. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada\. Private sector contributions are facilitated by the Foundation for the National Institutes of Health \([www\.fnih\.org](https://arxiv.org/html/2606.07798v1/www.fnih.org)\)\. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California\. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California\. ## Data availability statement Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative \(ADNI\) database accessible in[adni\.loni\.usc\.edu](https://arxiv.org/html/2606.07798v1/adni.loni.usc.edu)\. The ADNI, launched as a public\-private partnership in 2003, was led by Principal Investigator Michael W\. Weiner, MD\. The goal of the dataset is to test whether modalities such as magnetic resonance imaging \(MRI\), positron emission tomography \(PET\), biological markers, and different clinical and neuropsychological scores can help us in identifying and predicting the progression of Alzheimer’s Disease\. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report\. A complete listing of ADNI investigators can be found at:[http://adni\.loni\.usc\.edu/wp\-content/uploads/how\_to\_apply/ADNI\_Acknowledgement\_List\.pdf](http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf)\. ## Ethics statement All human subjects involved in this dataset \(clinical trial\) provided written informed consent prior to their participation\. ## Declaration of Generative AI and AI\-assisted technologies in the writing process During the preparation of this work, the authors used AI\-based tools \(including Google Gemini, Claude, ChatGPT, Grammarly, and QuillBot\) to assist with grammar correction, linguistic refinement, and enhancement\. After using these tools, the authors reviewed and edited the content as needed and take full responsibility for the content of the published article\. ## Funding sources This research did not receive any specific grant from funding agencies in the public, commercial, or not\-for\-profit sectors\. ## Appendix AFeature Description The table below gives the description of all the features FeatureTypeEncodingDimensionsAgeContinuousScaled by maximum value1GenderCategoricalOne\-hot \(Male \(1\) / Female \(2\)\)2Years of EducationContinuousScaled by maximum value1BMIContinuousScaled by maximum value1Hypertension StatusCategoricalOne\-hot \(Yes \(1\) / No \(0\)\)2DiagnosisCategoricalOne\-hot \(1 / 2 / 3\)3APOE4 StatusCategoricalOne\-hot \(Non / Heterozygous / Homozygous\)3MMSE†/ CDR\-SB‡ContinuousScaled by maximum possible value1Total14 Table 1:Feature, type of data, encoding strategy, and resulting dimensionality for input vector\. 1 = Cognitively Normal; 2 = Mild Cognitive Impairment; 3 = Dementia; BMI = Body Mass Index;†MMSE at baseline is included as a static covariate only in the CDR\-SB prediction model\.‡CDR\-SB at baseline is included as a static covariate only in the MMSE prediction model\. ## Appendix BVariational Autoencoder An autoencoder architecture is shown in Figure[1](https://arxiv.org/html/2606.07798#A2.F1)\. Herexxandx^\\hat\{x\}are the input and the reconstructed output, respectively\. Given a dataset𝐗=\{x\(i\)\}i=1N\\mathbf\{X\}=\\\{x^\{\(i\)\}\\\}\_\{i=1\}^\{N\}, whereNNis the number of data points, the assumption is that it is generated by a random processpθp\_\{\\theta\}involving the random variable𝐳\\mathbf\{z\}, which is the latent space\. The probabilistic encoder and probabilistic decoder are given by the distributionpθ\(z\|x\)p\_\{\\theta\}\(z\|x\)andpθ\(x\|z\)p\_\{\\theta\}\(x\|z\), respectively\. However,pθ\(z\|x\)p\_\{\\theta\}\(z\|x\)is intractable, hence we assume a surrogateqϕ\(z\|x\)q\_\{\\phi\}\(z\|x\)which is a Gaussian distribution with mean and standard deviation asμ\\muandσ\\sigmarespectively\. Figure 1:Variational autoencoder architectureThe goal is to learn the conditional distributionpθ\(z\|x\)p\_\{\\theta\}\(z\|x\), which we obtain by minimizing the distance between the surrogate and the original distribution using the KL divergence, given byDKL\(qϕ\(z\|x\(i\)\)∥pθ\(z\|x\(i\)\)\)D\_\{\\text\{KL\}\}\\left\(q\_\{\\phi\}\(z\|x^\{\(i\)\}\)\\,\\\|\\,p\_\{\\theta\}\(z\|x^\{\(i\)\}\)\\right\)\. Expanding the KL divergence term, we get logpθ\(x\(i\)\)=DKL\(qϕ\(z\|x\(i\)\)∥pθ\(z\|x\(i\)\)\)\+ℒ\(θ,ϕ;x\(i\)\)\\log p\_\{\\theta\}\(x^\{\(i\)\}\)=D\_\{KL\}\\left\(q\_\{\\phi\}\(z\|x^\{\(i\)\}\)\\parallel p\_\{\\theta\}\(z\|x^\{\(i\)\}\)\\right\)\+\\mathcal\{L\}\(\\theta,\\phi;x^\{\(i\)\}\)\(9\) The second part of the right\-hand side of the equation is the evidence lower bound \(ELBO\) given by: ℒ\(θ,ϕ;x\(i\)\)=−DKL\(qϕ\(z\|x\(i\)\)∥pθ\(z\)\)\+𝔼qϕ\(z\|x\(i\)\)\[logpθ\(x\(i\)\|z\)\]\\mathcal\{L\}\(\\theta,\\phi;x^\{\(i\)\}\)=\-D\_\{KL\}\(q\_\{\\phi\}\(z\|x^\{\(i\)\}\)\\parallel p\_\{\\theta\}\(z\)\)\+\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\|x^\{\(i\)\}\)\}\\left\[\\log p\_\{\\theta\}\(x^\{\(i\)\}\|z\)\\right\]\(10\) qϕ\(z\|x\(i\)\)q\_\{\\phi\}\(z\|x^\{\(i\)\}\)is assumed to be a multivariate Gaussian with mean and standard deviation ofμ\\muandσ\\sigma, and the priorpθ\(z\)p\_\{\\theta\}\(z\)is assumed to be a multivariate Gaussian with zero mean and identity covariance\. A detailed mathematical derivation can be found in the works by Kingma et al\.Kingma and Welling \[[2022b](https://arxiv.org/html/2606.07798#bib.bib1)\]and Odaibo et al\.Odaibo \[[2019](https://arxiv.org/html/2606.07798#bib.bib40)\]\. The final loss function for a VAE is given by ℒ\(θ,ϕ;x\(i\)\)≈12∑j=1J\(1\+log\(\(σj\(i\)\)2\)−\(μj\(i\)\)2−\(σj\(i\)\)2\)\+1L∑l=1Llogpθ\(x\(i\)\|z\(i,l\)\)\\mathcal\{L\}\(\\theta,\\phi;x^\{\(i\)\}\)\\approx\\frac\{1\}\{2\}\\sum\_\{j=1\}^\{J\}\\left\(1\+\\log\\left\(\\left\(\\sigma\_\{j\}^\{\(i\)\}\\right\)^\{2\}\\right\)\-\\left\(\\mu\_\{j\}^\{\(i\)\}\\right\)^\{2\}\-\\left\(\\sigma\_\{j\}^\{\(i\)\}\\right\)^\{2\}\\right\)\+\\frac\{1\}\{L\}\\sum\_\{l=1\}^\{L\}\\log p\_\{\\theta\}\(x^\{\(i\)\}\|z^\{\(i,l\)\}\)\(11\) Wherez\(i,l\)=μ\(i\)\+σ\(i\)⊙ϵlz^\{\(i,l\)\}=\\mu^\{\(i\)\}\+\\sigma^\{\(i\)\}\\odot\\epsilon^\{l\}andϵl∼𝒩\(0,I\)\\epsilon^\{l\}\\sim\\mathcal\{N\}\(0,I\)is added as a part of the reparameterization trick to address the challenge of backpropagating through random variablesKingma and Welling \[[2022b](https://arxiv.org/html/2606.07798#bib.bib1)\]\. In practice,μ\\muandσ\\sigmaare found using a multi\-layered perceptron \(MLP\) network, where appropriate weights and biases are learned during training\. ## Appendix CResults \- Effect of Input Configuration and Sequence Length LengthInputTime point1234567100\.85±\\pm0\.041\.12±\\pm0\.111\.33±\\pm0\.211\.35±\\pm0\.261\.75±\\pm0\.301\.83±\\pm0\.362\.13±\\pm0\.2320,1—0\.88±\\pm0\.061\.05±\\pm0\.151\.15±\\pm0\.161\.53±\\pm0\.191\.52±\\pm0\.271\.74±\\pm0\.1830,1,2——0\.87±\\pm0\.080\.97±\\pm0\.141\.41±\\pm0\.261\.38±\\pm0\.251\.55±\\pm0\.1440,1,2,3———0\.85±\\pm0\.161\.16±\\pm0\.271\.25±\\pm0\.231\.33±\\pm0\.2450,1,2,3,4————1\.03±\\pm0\.211\.22±\\pm0\.281\.20±\\pm0\.1960,1,2,3,4,5—————1\.07±\\pm0\.141\.20±\\pm0\.20Table 1:CDRSB prediction results with varying input lengthsInputs↓\\downarrow/Target→\\rightarrow012345670—0\.851\.131\.341\.371\.741\.792\.1110\.51—0\.851\.051\.161\.581\.531\.7620\.600\.83—0\.880\.971\.401\.391\.6230\.630\.951\.05—0\.921\.221\.251\.3940\.630\.991\.191\.21—1\.131\.241\.3550\.651\.041\.291\.421\.24—1\.101\.3860\.661\.051\.331\.461\.381\.42—1\.0870\.641\.061\.311\.431\.401\.511\.32—Table 2:CDRSB prediction results for different input/target time point combinationsLengthInputTime point1234567101\.86±\\pm0\.062\.12±\\pm0\.192\.19±\\pm0\.142\.30±\\pm0\.222\.70±\\pm0\.272\.63±\\pm0\.432\.72±\\pm0\.2120,1—1\.76±\\pm0\.091\.88±\\pm0\.082\.00±\\pm0\.172\.40±\\pm0\.272\.42±\\pm0\.582\.40±\\pm0\.2930,1,2——1\.72±\\pm0\.171\.85±\\pm0\.032\.23±\\pm0\.192\.23±\\pm0\.312\.27±\\pm0\.5340,1,2,3———1\.67±\\pm0\.052\.17±\\pm0\.132\.04±\\pm0\.312\.12±\\pm0\.3450,1,2,3,4————2\.11±\\pm0\.312\.04±\\pm0\.322\.12±\\pm0\.3060,1,2,3,4,5—————2\.24±\\pm0\.382\.28±\\pm0\.45Table 3:MMSE prediction results with varying input lengthsInputs↓\\downarrow/Target→\\rightarrow012345670—1\.892\.082\.262\.412\.812\.632\.8611\.35—1\.842\.072\.242\.702\.662\.8421\.481\.81—1\.842\.102\.442\.422\.4531\.461\.902\.05—1\.862\.382\.302\.5441\.461\.942\.152\.01—2\.042\.082\.2751\.531\.992\.292\.382\.25—2\.222\.3561\.491\.982\.242\.332\.332\.47—2\.0671\.512\.012\.262\.392\.402\.652\.45—Table 4:MMSE prediction results for different input/target time point combinationsInputs↓\\downarrow/Target→\\rightarrow012345670, 1——0\.851\.061\.151\.551\.501\.790, 2—0\.73—0\.891\.011\.401\.361\.600, 3—0\.780\.91—0\.861\.181\.251\.470, 4—0\.831\.041\.09—1\.151\.321\.450, 5—0\.831\.091\.241\.14—1\.131\.420, 6—0\.841\.101\.271\.231\.33—1\.010, 7—0\.831\.091\.251\.281\.441\.30—1, 20\.50——0\.840\.971\.431\.471\.641, 30\.52—0\.77—0\.811\.111\.171\.231, 40\.51—0\.800\.84—1\.031\.181\.291, 50\.51—0\.850\.970\.96—1\.011\.201, 60\.51—0\.840\.991\.081\.28—1\.021, 70\.51—0\.850\.991\.071\.331\.16—2, 30\.580\.82——0\.821\.151\.241\.322, 40\.580\.81—0\.79—0\.961\.141\.192, 50\.580\.82—0\.830\.79—0\.961\.142, 60\.580\.81—0\.830\.901\.09—1\.062, 70\.580\.82—0\.810\.861\.141\.10—3, 40\.610\.951\.04——1\.001\.141\.203, 50\.620\.941\.04—0\.78—0\.941\.093, 60\.620\.941\.04—0\.891\.03—0\.963, 70\.620\.961\.04—0\.881\.091\.01—4, 50\.640\.991\.191\.15——1\.021\.164, 60\.641\.001\.191\.16—0\.90—0\.884, 70\.651\.001\.191\.16—1\.001\.01—5, 60\.661\.051\.301\.411\.22——1\.015, 70\.661\.041\.271\.351\.16—0\.87—6, 70\.651\.061\.321\.431\.351\.42——Table 5:CDRSB prediction results for different two\-input/target time point combinationsInputs↓\\downarrow/Target→\\rightarrow012345670, 1——1\.761\.882\.002\.402\.342\.400, 2—1\.67—1\.812\.022\.392\.502\.450, 3—1\.841\.95—1\.902\.382\.292\.410, 4—1\.821\.981\.87—2\.162\.012\.260, 5—1\.832\.032\.082\.04—2\.062\.320, 6—1\.852\.062\.072\.082\.24—1\.970, 7—1\.862\.082\.142\.272\.532\.27—1, 21\.31——1\.641\.792\.142\.212\.231, 31\.37—1\.63—1\.682\.092\.242\.381, 41\.29—1\.661\.65—1\.822\.072\.211, 51\.30—1\.731\.811\.75—1\.962\.181, 61\.33—1\.761\.891\.952\.15—2\.001, 71\.44—1\.932\.112\.292\.752\.49—2, 31\.391\.76——1\.702\.112\.082\.232, 41\.381\.72—1\.53—1\.761\.962\.072, 51\.401\.75—1\.661\.79—2\.022\.182, 61\.381\.73—1\.651\.791\.96—1\.842, 71\.471\.81—1\.832\.002\.292\.19—3, 41\.491\.912\.03——1\.941\.942\.123, 51\.421\.872\.02—1\.73—1\.992\.213, 61\.461\.902\.06—1\.772\.13—2\.143, 71\.461\.892\.04—1\.802\.242\.02—4, 51\.501\.972\.222\.19——2\.122\.364, 61\.471\.942\.132\.01—1\.85—1\.794, 71\.461\.902\.091\.94—1\.901\.72—5, 61\.491\.982\.202\.312\.15——2\.035, 71\.491\.972\.232\.282\.11—1\.97—6, 71\.491\.972\.222\.312\.232\.27——Table 6:MMSE prediction results for different two\-input/target time point combinations ## Appendix DAblation Results Tables The tables below present the ablation results for predictions of CDR\-SB and MMSE scores\. Time pointsModel2345678910AvgOur Model0\.87751\.05191\.14761\.53421\.51821\.73801\.71512\.21222\.15301\.4464Avg hidden state0\.85191\.05961\.17831\.56321\.49311\.78131\.68662\.23512\.19211\.4491Attention scores0\.85431\.07181\.17441\.57611\.49061\.77011\.69682\.26422\.22591\.4591W/O Age0\.87251\.08741\.19821\.63211\.58271\.90321\.72142\.26722\.25501\.4891W/O Education0\.84601\.05901\.15271\.55461\.50171\.78691\.70592\.20822\.22611\.4483W/O BMI0\.86991\.07261\.17161\.61201\.53231\.81661\.69842\.19792\.21261\.4605W/O Hypertension0\.84861\.06081\.15381\.54191\.47701\.78081\.65262\.14422\.05151\.4148W/O Gender0\.85751\.06471\.14141\.54481\.49661\.77051\.68282\.14101\.99491\.4160W/O MMSE0\.85491\.06711\.18531\.61191\.54481\.86381\.72312\.24632\.18491\.4673W/O Diagnosis0\.87171\.07781\.19121\.62081\.52921\.79731\.70312\.18192\.08671\.4506W/O APOE40\.85751\.07111\.19791\.59051\.56921\.78131\.74712\.34222\.14481\.4772 Table 1:Mean Absolute Error comparison for different CDR–SB model variants and ablationsTime pointsModel2345678910AvgOur Model1\.75531\.88311\.99742\.39532\.34112\.39932\.44633\.16612\.80902\.2653Avg hidden state1\.80351\.92382\.05452\.50652\.45402\.74452\.43563\.32193\.49932\.3957Attention scores1\.87272\.00922\.18372\.56232\.58612\.54022\.53643\.23263\.33312\.4309W/O Age1\.96682\.04712\.22312\.66502\.54982\.70792\.46193\.05912\.86912\.3982W/O Education1\.77581\.89201\.99512\.42712\.42632\.50352\.33182\.99422\.82772\.2460W/O BMI1\.90061\.95172\.12592\.57862\.53842\.74082\.42472\.95102\.67862\.3186W/O Hypertension1\.79231\.92292\.03942\.50202\.37502\.48152\.46873\.10822\.84322\.2962W/O Gender1\.79581\.91071\.99952\.36232\.36332\.43752\.45593\.18323\.08382\.3083W/O CDR\-SB1\.87731\.97342\.14032\.54762\.51932\.76182\.41982\.95282\.73992\.3212W/O Diagnosis1\.72201\.90531\.94942\.28612\.36572\.37342\.55233\.37003\.36692\.3411W/O APOE41\.81811\.91042\.06132\.50822\.61482\.80632\.61603\.27083\.02142\.3880 Table 2:Mean Absolute Error \(MAE\) comparison for different MMSE model variants and ablations ## References - T\. Akiba, S\. Sano, T\. Yanase, T\. Ohta, and M\. Koyama \(2019\)Optuna: a next\-generation hyperparameter optimization framework\.InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining,pp\. 2623–2631\.Cited by:[§2\.3\.2](https://arxiv.org/html/2606.07798#S2.SS3.SSS2.p1.1)\. - K\. Blennow \(2017\)A review of fluid biomarkers for alzheimer’s disease: moving from csf to blood\.Neurology and Therapy6,pp\. 15–24\.External Links:[Document](https://dx.doi.org/10.1007/s40120-017-0073-9),ISSN 2193\-8253Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p2.6)\. - R\. T\. Q\. Chen, Y\. Rubanova, J\. Bettencourt, and D\. Duvenaud \(2019\)Neural ordinary differential equations\.External Links:1806\.07366,[Link](https://arxiv.org/abs/1806.07366)Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p5.1),[§2\.2\.3](https://arxiv.org/html/2606.07798#S2.SS2.SSS3.p1.2)\. - R\. T\. Chen, B\. Amos, and M\. Nickel \(2020\)Learning neural event functions for ordinary differential equations\.arXiv preprint arXiv:2011\.03902\.Cited by:[§3\.6](https://arxiv.org/html/2606.07798#S3.SS6.p3.2)\. - K\. Cho, B\. van Merrienboer, C\. Gulcehre, D\. Bahdanau, F\. Bougares, H\. Schwenk, and Y\. Bengio \(2014\)Learning phrase representations using rnn encoder\-decoder for statistical machine translation\.External Links:1406\.1078,[Link](https://arxiv.org/abs/1406.1078)Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[§2\.2\.1](https://arxiv.org/html/2606.07798#S2.SS2.SSS1.p1.2)\. - S\. H\. Cho, M\. Jang, H\. Ju, M\. J\. Kang, J\. M\. Yun, and J\. W\. Yun \(2022\)Association of late\-life body mass index with the risk of alzheimer disease: a 10\-year nationwide population\-based cohort study\.Scientific Reports12\(1\),pp\. 15298\.Cited by:[§4](https://arxiv.org/html/2606.07798#S4.p4.1)\. - J\. Chung, C\. Gulcehre, K\. Cho, and Y\. Bengio \(2014\)Empirical evaluation of gated recurrent neural networks on sequence modeling\.arXiv preprint arXiv:1412\.3555\.Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - R\. Das, A\. Chatterjee, and S\. Roy \(2025\)An interpretable bayesian framework for alzheimer’s disease prediction with uncertainty quantification\.Neuroscience589,pp\. 150–160\.External Links:ISSN 0306\-4522,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.neuroscience.2025.10.021),[Link](https://www.sciencedirect.com/science/article/pii/S0306452225010139)Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[§2\.1\.2](https://arxiv.org/html/2606.07798#S2.SS1.SSS2.p1.1)\. - V\. Devanarayan, Y\. Ye, A\. Charil, E\. Andreozzi, P\. Sachdev, D\. A\. Llano, L\. Tian, L\. Zhu, H\. Hampel, L\. Kramer, S\. Dhadda, and M\. Irizarry \(2024\)Predicting clinical progression trajectories of early alzheimer’s disease patients\.Alzheimer’s and Dementia20,pp\. 1725–1738\.External Links:[Document](https://dx.doi.org/10.1002/alz.13565),ISSN 15525279Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[Table 7](https://arxiv.org/html/2606.07798#S3.T7.3.5.1.1.1)\. - S\. Evans, K\. McRae‐McKee, C\. Hadjichrysanthou, M\. M\. Wong, D\. Ames, O\. Lopez, F\. de Wolf, and R\. M\. Anderson \(2019\)Alzheimer’s disease progression and risk factors: a standardized comparison between six large data sets\.Alzheimer’s & Dementia: Translational Research & Clinical Interventions5,pp\. 515–523\.External Links:[Document](https://dx.doi.org/10.1016/j.trci.2019.04.005),ISSN 2352\-8737Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p1.1)\. - M\. F\. Folstein, S\. E\. Folstein, and P\. R\. McHugh \(1975\)“Mini\-mental state”: a practical method for grading the cognitive state of patients for the clinician\.Journal of Psychiatric Research12,pp\. 189–198\.External Links:[Document](https://dx.doi.org/10.1016/0022-3956%2875%2990026-6),ISSN 00223956Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p2.6)\. - F\. Franco\-Marina, J\. J\. García\-González, F\. Wagner\-Echeagaray, J\. Gallo, O\. Ugalde, S\. Sánchez\-García, C\. Espinel\-Bermúdez, T\. Juárez\-Cedillo, M\. Á\. V\. Rodríguez, and C\. García\-Peña \(2010\)The mini\-mental state examination revisited: ceiling and floor effects after score adjustment for educational level in an aging mexican population\.International Psychogeriatrics\.External Links:ISSN 1041\-6102,[Document](https://dx.doi.org/https%3A//doi.org/10.1017/S1041610209990822)Cited by:[§4](https://arxiv.org/html/2606.07798#S4.p1.1)\. - I\. Goodfellow, Y\. Bengio, and A\. Courville \(2016\)Deep learning\.MIT Press\.External Links:[Link](http://www.deeplearningbook.org/)Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - S\. Jeong, W\. Jung, J\. Sohn, and H\. I\. Suk \(2024\)Deep geometric learning with monotonicity constraints for alzheimer’s disease progression\.IEEE Transactions on Neural Networks and Learning Systems\.External Links:[Document](https://dx.doi.org/10.1109/TNNLS.2024.3394598),ISSN 21622388Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p5.1)\. - S\. Jiang, Y\. Xie, and G\. A\. Colditz \(2021\)Functional ensemble survival tree: dynamic prediction of alzheimer’s disease progression accommodating multiple time\-varying covariates\.Journal of the Royal Statistical Society\. Series C: Applied Statistics70,pp\. 66–79\.External Links:[Document](https://dx.doi.org/10.1111/rssc.12449),ISSN 14679876Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - W\. Jung, E\. Jun, H\. I\. Suk, and A\. D\. N\. Initiative \(2021\)Deep recurrent model for individualized prediction of alzheimer’s disease progression\.NeuroImage237\.External Links:[Document](https://dx.doi.org/10.1016/j.neuroimage.2021.118143),ISSN 10959572Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[§3\.7](https://arxiv.org/html/2606.07798#S3.SS7.p5.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.9.4.1)\. - J\. Kim, J\. M\. Basak, and D\. M\. Holtzman \(2009\)The role of apolipoprotein e in alzheimer’s disease\.Neuron63\(3\),pp\. 287–303\.Cited by:[§4](https://arxiv.org/html/2606.07798#S4.p4.1)\. - D\. P\. Kingma and M\. Welling \(2022a\)Auto\-encoding variational bayes\.External Links:1312\.6114,[Link](https://arxiv.org/abs/1312.6114)Cited by:[§2\.3\.1](https://arxiv.org/html/2606.07798#S2.SS3.SSS1.p1.5)\. - D\. P\. Kingma and M\. Welling \(2022b\)Auto\-encoding variational bayes\.External Links:1312\.6114,[Link](https://arxiv.org/abs/1312.6114)Cited by:[Appendix B](https://arxiv.org/html/2606.07798#A2.p7.4),[Appendix B](https://arxiv.org/html/2606.07798#A2.p9.4),[§2\.2\.2](https://arxiv.org/html/2606.07798#S2.SS2.SSS2.p1.2)\. - B\. Lei, E\. Liang, M\. Yang, P\. Yang, F\. Zhou, E\. L\. Tan, Y\. Lei, C\. M\. Liu, T\. Wang, X\. Xiao, and S\. Wang \(2022\)Predicting clinical scores for alzheimer’s disease based on joint and deep learning\.Expert Systems with Applications187\.External Links:[Document](https://dx.doi.org/10.1016/j.eswa.2021.115966),ISSN 09574174Cited by:[§3\.7](https://arxiv.org/html/2606.07798#S3.SS7.p3.1),[§3\.7](https://arxiv.org/html/2606.07798#S3.SS7.p4.1),[Table 7](https://arxiv.org/html/2606.07798#S3.T7.3.4.1.1.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.8.4.1)\. - B\. Lei, M\. Yang, P\. Yang, F\. Zhou, W\. Hou, W\. Zou, X\. Li, T\. Wang, X\. Xiao, and S\. Wang \(2020\)Deep and joint learning of longitudinal data for alzheimer’s disease prediction\.Pattern Recognition102\.External Links:[Document](https://dx.doi.org/10.1016/j.patcog.2020.107247),ISSN 00313203Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[Table 7](https://arxiv.org/html/2606.07798#S3.T7.3.3.1.1.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.8.3.1)\. - W\. Liang, K\. Zhang, P\. Cao, X\. Liu, J\. Yang, and O\. Zaiane \(2021\)Rethinking modeling alzheimer’s disease progression from a multi\-task learning perspective with deep recurrent neural network\.Computers in Biology and Medicine138\.External Links:[Document](https://dx.doi.org/10.1016/j.compbiomed.2021.104935),ISSN 18790534Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - M\. Liu, J\. Zhang, C\. Lian, and D\. Shen \(2020\)Weakly supervised deep learning for brain disease prognosis using mri and incomplete clinical scores\.IEEE Transactions on Cybernetics50,pp\. 3381–3392\.External Links:[Document](https://dx.doi.org/10.1109/TCYB.2019.2904186),ISSN 21682275Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[Table 7](https://arxiv.org/html/2606.07798#S3.T7.2.3.1.1.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.9.3.1)\. - V\. J\. Lowe, C\. T\. Mester, E\. S\. Lundt, J\. Lee, S\. Ghatamaneni, A\. Algeciras‐Schimnich, M\. R\. Campbell, J\. Graff‐Radford, A\. Nguyen, H\. Min, M\. L\. Senjem, M\. M\. Machulda, C\. G\. Schwarz, D\. W\. Dickson, M\. E\. Murray, K\. K\. Kandimalla, K\. Kantarci, B\. Boeve, P\. Vemuri, D\. T\. Jones, D\. Knopman, C\. R\. Jack, R\. C\. Petersen, and M\. M\. Mielke \(2024\)Amyloid pet detects the deposition of brain aβ\\betaearlier than csf fluid biomarkers\.Alzheimer’s & Dementia20,pp\. 8097–8112\.External Links:[Document](https://dx.doi.org/10.1002/alz.14317),ISSN 1552\-5260Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p2.6)\. - U\. Morar, H\. Martin, W\. Izquierdo, P\. Forouzannezhad, E\. Zarafshan, R\. E\. Curiel, M\. Roselli, D\. Loewenstein, R\. Duara, E\. Unger, and M\. Adjouadi \(2020\)A deep\-learning approach for the prediction of mini\-mental state examination scores in a multimodal longitudinal study\.InProceedings \- 2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020,pp\. 761–766\.External Links:[Document](https://dx.doi.org/10.1109/CSCI51800.2020.00144),ISBN 9781728176246Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.9.5.1)\. - U\. Morar, H\. Martin, P\. M\. Robin, W\. Izquierdo, E\. Zarafshan, P\. Forouzannezhad, E\. Unger, M\. Cabrerizo, R\. E\. C\. Cid, M\. Rosselli, A\. Barreto, N\. Rishe, D\. E\. Vaillancourt, S\. T\. DeKosky, D\. Loewenstein, R\. Duara, and M\. Adjouadi \(2023\)Prediction of cognitive test scores from variable length multimodal data in alzheimer’s disease\.Cognitive Computation15,pp\. 2062–2086\.External Links:[Document](https://dx.doi.org/10.1007/s12559-023-10169-w),ISSN 18669964Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[§3\.7](https://arxiv.org/html/2606.07798#S3.SS7.p5.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.9.8.1)\. - J\. Mouchet, K\. A\. Betts, M\. V\. Georgieva, R\. Ionescu\-Ittu, L\. M\. Butler, X\. Teitsma, P\. Delmar, T\. Kulalert, J\. Zhu, N\. Lema, and U\. Desai \(2021\)Classification, prediction, and concordance of cognitive and functional progression in patients with mild cognitive impairment in the united states: a latent class analysis\.Journal of Alzheimer’s Disease82,pp\. 1667–1682\.External Links:[Document](https://dx.doi.org/10.3233/JAD-210305),ISSN 13872877Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p3.1)\. - S\. G\. Mueller, M\. W\. Weiner, L\. J\. Thal, R\. C\. Petersen, C\. Jack, W\. Jagust, J\. Q\. Trojanowski, A\. W\. Toga, and L\. Beckett \(2005\)The alzheimer’s disease neuroimaging initiative\.Neuroimaging Clinics of North America15,pp\. 869–877\.External Links:[Document](https://dx.doi.org/10.1016/j.nic.2005.09.008),ISSN 10525149Cited by:[§2\.1\.1](https://arxiv.org/html/2606.07798#S2.SS1.SSS1.p1.1),[§2\.3\.3](https://arxiv.org/html/2606.07798#S2.SS3.SSS3.p1.1)\. - D\. Mukherji, M\. Mukherji, and N\. Mukherji \(2022\)Early detection of alzheimer’s disease using neuropsychological tests: a predict–diagnose approach using neural networks\.Brain Informatics9\.External Links:[Document](https://dx.doi.org/10.1186/s40708-022-00169-1),ISSN 21984026Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - M\. Nguyen, T\. He, L\. An, D\. C\. Alexander, J\. Feng, and B\. T\. Yeo \(2020\)Predicting alzheimer’s disease progression using deep recurrent neural networks\.NeuroImage222\.External Links:[Document](https://dx.doi.org/10.1016/j.neuroimage.2020.117203),ISSN 10959572Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - S\. Odaibo \(2019\)Tutorial: deriving the standard variational autoencoder \(vae\) loss function\.arXiv preprint arXiv:1907\.08956\.Cited by:[Appendix B](https://arxiv.org/html/2606.07798#A2.p7.4)\. - K\. Peterson, O\. Rudovic, R\. Guerrero, and R\. W\. Picard \(2018\)Personalized gaussian processes for future prediction of alzheimer’s disease progression\.External Links:1712\.00181,[Link](https://arxiv.org/abs/1712.00181)Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - K\. Poonam, R\. Guha, and P\. P\. Chakrabarti \(2024\)Predicting alzheimer’s disease progression using a versatile sequence\-length\-adaptive encoder\-decoder lstm architecture\.IEEE Journal of Biomedical and Health Informatics28,pp\. 4184–4193\.External Links:[Document](https://dx.doi.org/10.1109/JBHI.2024.3386801),ISSN 21682208Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - K\. Poonam, R\. Guha, and P\. P\. Chakrabarti \(2023\)Accurate prediction of alzheimer’s disease progression trajectory via a novel encoder\-decoder lstm architecture\.In2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society \(EMBC\),Vol\.,pp\. 1–4\.External Links:[Document](https://dx.doi.org/10.1109/EMBC40787.2023.10340517)Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - C\. Puri, G\. Kooijman, B\. Vanrumste, and S\. Luca \(2022\)Forecasting time series in healthcare with gaussian processes and dynamic time warping based subset selection\.IEEE Journal of Biomedical and Health Informatics26,pp\. 6126–6137\.External Links:[Document](https://dx.doi.org/10.1109/JBHI.2022.3214343),ISSN 21682208Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1)\. - A\. D\. Roses \(2006\)On the discovery of the genetic association of apolipoprotein e genotypes and common late\-onset alzheimer disease\.Journal of Alzheimer’s Disease9\(s3\),pp\. 361–366\.Cited by:[§4](https://arxiv.org/html/2606.07798#S4.p4.1)\. - S\. B\. Sando, S\. Melquist, A\. Cannon, M\. L\. Hutton, O\. Sletvold, I\. Saltvedt, L\. R\. White, S\. Lydersen, and J\. O\. Aasly \(2008\)APOEε\\varepsilon4 lowers age at onset and is a high risk factor for alzheimer’s disease; a case control study from central norway\.BMC neurology8\(1\),pp\. 9\.Cited by:[§4](https://arxiv.org/html/2606.07798#S4.p4.1)\. - T\. Schaap, P\. Thropp, and D\. Tosun \(2024\)Timing of alzheimer’s disease biomarker progressions: a two\-decade observational study from the alzheimer’s disease neuroimaging initiative \(adni\)\.Alzheimer’s and Dementia\.External Links:[Document](https://dx.doi.org/10.1002/alz.14306),ISSN 15525279Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p2.6)\. - S\. Tabarestani, M\. Aghili, M\. Eslami, M\. Cabrerizo, A\. Barreto, N\. Rishe, R\. E\. Curiel, D\. Loewenstein, R\. Duara, and M\. Adjouadi \(2020\)A distributed multitask multimodal approach for the prediction of alzheimer’s disease in a longitudinal study\.NeuroImage206\.External Links:[Document](https://dx.doi.org/10.1016/j.neuroimage.2019.116317),ISSN 10959572Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[§3\.7](https://arxiv.org/html/2606.07798#S3.SS7.p5.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.9.7.1)\. - R\. Tzeng, Y\. Yang, K\. Hsu, H\. Chang, and P\. Chiu \(2022\)Sum of boxes of the clinical dementia rating scale highly predicts conversion or reversion in predementia stages\.Frontiers in Aging Neuroscience14\.External Links:[Document](https://dx.doi.org/10.3389/fnagi.2022.1021792),ISSN 1663\-4365Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p2.6)\. - World Health Organization \(2024\)Dementia\.Note:Accessed: 2025\-05\-19External Links:[Link](https://www.who.int/news-room/fact-sheets/detail/dementia)Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p1.1)\. - H\. Xie, C\. Zhang, Y\. Wang, S\. Huang, W\. Cui, W\. Yang, L\. Koski, X\. Xu, Y\. Li, M\. Zheng,et al\.\(2016\)Distinct patterns of cognitive aging modified by education level and gender among adults with limited or no formal education: a normative study of the mini\-mental state examination\.Journal of alzheimer’s disease49\(4\),pp\. 961–969\.Cited by:[§4](https://arxiv.org/html/2606.07798#S4.p4.1)\. - Z\. Yuan, X\. Li, Z\. Hao, Z\. Tang, X\. Yao, and T\. Wu \(2024\)Intelligent prediction of alzheimer’s disease via improved multifeature squeeze\-and\-excitation\-dilated residual network\.Scientific Reports14\.External Links:[Document](https://dx.doi.org/10.1038/s41598-024-62712-w),ISSN 20452322Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p4.1),[§3\.7](https://arxiv.org/html/2606.07798#S3.SS7.p4.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.8.5.1),[Table 8](https://arxiv.org/html/2606.07798#S3.T8.9.6.1)\. - J\. Zhang, Y\. Zhang, J\. Wang, Y\. Xia, J\. Zhang, and L\. Chen \(2024\)Recent advances in alzheimer’s disease: mechanisms, clinical trials and new drug development strategies\.Signal Transduction and Targeted Therapy9,pp\. 211\.External Links:[Document](https://dx.doi.org/10.1038/s41392-024-01911-3),ISSN 2059\-3635Cited by:[§1](https://arxiv.org/html/2606.07798#S1.p2.6)\.
Similar Articles
Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories
This paper proposes a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning to forecast 24-month CDR-SB change from ADNI clinical and biomarker histories, achieving reduced MSE and improved correlation over baselines.
Brain Vascular Age Prediction Using Cerebral Blood Flow Velocity and Machine Learning Algorithms
This paper uses machine learning models trained on transcranial Doppler features to predict brain vascular age, finding accelerated aging in subjects with brain diseases such as stroke and Alzheimer's.
From Static Risk to Dynamic Trajectories: Toward World-Model-Inspired Clinical Prediction
This review paper proposes a unified framework for intervention-aware disease trajectory modeling in clinical AI, addressing static prediction failures by incorporating treatment confounder feedback and informative observation patterns.
Parallel-in-Time Training of Recurrent Neural Networks for Dynamical Systems Reconstruction
This paper investigates parallel-in-time algorithms for training recurrent neural networks in dynamical systems reconstruction, proposing GTF-DEER that enables stable learning over long sequences and improves reconstruction accuracy.
Njord: A Probabilistic Graph Neural Network for Ensemble Ocean Forecasting
Njord is a probabilistic graph neural network for ensemble ocean forecasting that provides uncertainty estimates and achieves state-of-the-art performance on global and regional benchmarks, improving surface temperature prediction.