A Longitudinal Attribute-Conditioned Neural Network for Modeling Health-State Transition Probabilities in Temporally Irregular Data: The LANTERN Framework

arXiv cs.LG 06/15/26, 04:00 AM Papers
Summary
This paper introduces LANTERN, a neural network framework for estimating health-state transition probabilities from irregular longitudinal data, with applications to long-term care insurance. It outperforms traditional methods in discrimination and calibration for severe disability and mortality prediction.
arXiv:2606.13880v1 Announce Type: new Abstract: Accurate estimation of long-term care transition probabilities is central to disability insurance pricing, reserving, and solvency assessment. Classical actuarial multi-state models commonly rely on Markov, semi-Markov, or proportional-hazard specifications, which provide a direct connection to cohort projection but may be restrictive for irregular longitudinal health data with nonlinear aging patterns and heterogeneous covariate histories. This paper develops a well-calibrated estimator of multi-state transition probabilities for irregular longitudinal health data. The model learns from individual health history, incorporates the time elapsed between observations, and conditions transition probabilities on demographic and socioeconomic attributes. It produces a valid probability distribution over the next observed health state, with four possible states: healthy, mild disability, severe disability, and death. Individual probabilities are aggregated by age group and origin state to form transition matrices compatible with actuarial cohort projection. Using longitudinal data from the Health and Retirement Study, we compare the proposed estimator with logistic regression, gradient-boosted trees, a recurrent neural network, and a last-state persistence benchmark. The evaluation considers probabilistic accuracy, endpoint discrimination and calibration for severe disability and death, risk concentration, and transition matrix error after aggregation. The proposed estimator improves severe disability discrimination relative to logistic regression and gradient-boosted tree benchmarks, maintains strong calibration, and yields the lowest transition matrix error among the evaluated models in the held-out test analysis. Results show that a structured machine learning estimator can support long-term care transition modeling when judged by calibration and projection fidelity, beyond discrimination.
Original Article
View Cached Full Text
Cached at: 06/15/26, 09:08 AM
# A Longitudinal Attribute-Conditioned Neural Network for Modeling Health-State Transition Probabilities in Temporally Irregular Data: The LANTERN Framework
Source: [https://arxiv.org/html/2606.13880](https://arxiv.org/html/2606.13880)
Beckett SternerPetar JevtićSchool of Computing and Augmented Intelligence, Arizona State University, Tempe, USASchool of Life Sciences, Arizona State University, Tempe, USASchool of Mathematical and Statistical Sciences, Arizona State University, Tempe, USA

###### Abstract

Accurate estimation of long\-term care transition probabilities is central to disability insurance pricing, reserving, and solvency assessment\. Classical actuarial multi\-state models commonly rely on Markov, semi\-Markov, or proportional\-hazard specifications, which provide a direct connection to cohort projection but may be restrictive for irregular longitudinal health data with nonlinear aging patterns and heterogeneous covariate histories\.

This paper develops a well\-calibrated estimator of multi\-state transition probabilities for irregular longitudinal health data\. The model learns from individual health history, incorporates the time elapsed between observations, and conditions transition probabilities on demographic and socioeconomic attributes\. It produces a valid probability distribution over the next observed health state, with four possible states: healthy, mild disability, severe disability, and death\. Individual probabilities are aggregated by age group and origin state to form transition matrices compatible with actuarial cohort projection\.

Using longitudinal data from the Health and Retirement Study, we compare the proposed estimator with logistic regression, gradient\-boosted trees, a recurrent neural network, and a last\-state persistence benchmark\. The evaluation considers probabilistic accuracy, endpoint discrimination and calibration for severe disability and death, risk concentration, and transition matrix error after aggregation\. The proposed estimator improves severe disability discrimination relative to logistic regression and gradient\-boosted tree benchmarks, maintains strong calibration, and yields the lowest transition matrix error among the evaluated models in the held\-out test analysis\. These results show that a structured machine learning estimator can support long\-term care transition modeling when judged by calibration and projection fidelity, beyond discrimination\.

###### keywords:

Long\-term care insurance , Multi\-state models , Transition probabilities , Irregular longitudinal data , Machine learning

††journal:Insurance: Mathematics and Economics## 1Introduction

Long\-term care \(LTC\) risk is becoming an increasingly important source of financial uncertainty for households, insurers, and public programs\. As populations age, more individuals are expected to spend part of later life with functional limitations that require formal or informal care\. LTC insurance products are intended to pool and pre\-fund part of this risk, but its viability depends on credible estimates of how individuals move among functional health states over time\[[34](https://arxiv.org/html/2606.13880#bib.bib38),[38](https://arxiv.org/html/2606.13880#bib.bib3),[33](https://arxiv.org/html/2606.13880#bib.bib1)\]\.

In actuarial LTC models, disability progression and mortality are commonly represented using finite state multi\-state models\. Individuals move between states such as healthy, mildly disabled, severely disabled, and dead, and the associated transition probabilities determine projected disability prevalence, expected benefits payments, reserves, and solvency requirements\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[12](https://arxiv.org/html/2606.13880#bib.bib12),[34](https://arxiv.org/html/2606.13880#bib.bib38),[38](https://arxiv.org/html/2606.13880#bib.bib3)\]\. Because cohort projection repeatedly applies transition probabilities over future ages, small errors in these probabilities can accumulate over long horizons and materially affect valuation results\[[5](https://arxiv.org/html/2606.13880#bib.bib50),[30](https://arxiv.org/html/2606.13880#bib.bib49)\]\.

Estimating these transition probabilities is difficult in longitudinal aging data\. Functional decline may depend not only on the current disability state, but also on accumulated health history, previous disability episodes, comorbidities, demographic characteristics, and the time elapsed between observations\[[13](https://arxiv.org/html/2606.13880#bib.bib39),[39](https://arxiv.org/html/2606.13880#bib.bib2)\]\. In surveys such as the Health and Retirement Study, individuals are observed in repeated survey waves\. Because individuals may miss waves, we refer to each observed person\-wave record as a visit; visit spacing therefore varies across individuals and waves\[[3](https://arxiv.org/html/2606.13880#bib.bib27)\]\. These features complicate standard Markov, semi\-Markov, and proportional\-hazard specifications, which typically condition on the current state, duration in state, or pre\-specified covariates rather than learning a flexible representation of the full longitudinal health history\[[1](https://arxiv.org/html/2606.13880#bib.bib28),[20](https://arxiv.org/html/2606.13880#bib.bib42),[27](https://arxiv.org/html/2606.13880#bib.bib6)\]\.

In this paper, a representation refers to a learned summary of prior observed health information, such as disability history, comorbidity patterns, and time elapsed between visits, that is used to estimate future transition probabilities\.

Classical actuarial and survival models provide interpretable transition structures and a direct connection to valuation, but they can be restrictive when health trajectories are nonlinear, heterogeneous, and irregularly observed\. Recent work has begun to connect machine learning with health transition and multi\-state survival modeling\. For example,\[[43](https://arxiv.org/html/2606.13880#bib.bib9)\]combine neural networks with a generalized linear model to estimate and predict health transition intensities, allowing socioeconomic and lifestyle factors to enter through linear and nonlinear relationships\. In multi\-state survival analysis,\[[35](https://arxiv.org/html/2606.13880#bib.bib54)\]propose pseudo\-value\-based deep neural networks for subject\-specific prediction of multi\-state quantities, including transition probabilities and state occupation probabilities, in the presence of censoring\. More broadly, machine learning methods have been used for longitudinal clinical prediction tasks such as mortality, readmission, length\-of\-stay prediction, physiologic decompensation, and medical state pre\-warning\[[36](https://arxiv.org/html/2606.13880#bib.bib16),[18](https://arxiv.org/html/2606.13880#bib.bib17),[31](https://arxiv.org/html/2606.13880#bib.bib8)\]\. For LTC insurance product applications, however, a useful model must do more than rank individuals by risk\. It must produce calibrated probability vectors over the possible health states, so that individual\-level predictions can be aggregated into transition matrices for cohort projection\[[15](https://arxiv.org/html/2606.13880#bib.bib13),[40](https://arxiv.org/html/2606.13880#bib.bib23),[9](https://arxiv.org/html/2606.13880#bib.bib53)\]\.

As a contribution to this important research topic, this paper developsLANTERN\(LongitudinalAttribute\-conditionedNeuralTransitionEstimationRecurrentNetwork\), a calibrated history\-dependent estimator of transition probabilities over the next observed health state for irregular longitudinal health data\. The model learns a latent representation of individual health history, incorporates elapsed\-time information, and conditions transition risk on demographic and socioeconomic attributes\. It outputs a coherent probability distribution over four possible next observed states: healthy, mild disability, severe disability, and death\. These individual\-level probabilities can be aggregated by age group and origin state to form transition matrices compatible with discrete time actuarial projection\.

The central idea of this work is to retain the actuarial multi\-state projection framework while replacing restrictive parametric transition probability estimation with a flexible history\-dependent estimator learned from longitudinal data\. Specifically, the contribution of the paper is threefold\. First, we formulate LTC transition estimation as a history\-dependent multi\-state probability problem under irregular observation intervals\. Second, we propose a structured neural estimator that relaxes first\-order Markov dependence on the current observed state by using learned summaries of prior health history together with the time elapsed between observations and demographic information, while still producing valid probability vectors over the possible next states\. Third, we evaluate the estimator using actuarially relevant criteria, including calibration, endpoint risk concentration, transition\-matrix error, and an illustrative cohort valuation exercise\.

Using longitudinal data from the Health and Retirement Study, we compare our model with logistic regression, gradient\-boosted trees, a recurrent neural network, and a last\-state persistence benchmark\. The empirical analysis evaluates both individual\-level probabilistic performance and aggregate transition\-matrix accuracy\. This distinction is important because an estimator useful for LTC insurance product applications must not only rank individuals by risk, but also produce calibrated transition probabilities that can be aggregated into stable transition matrices for projection\.

The remainder of the paper is organized as follows\. Section[2](https://arxiv.org/html/2606.13880#S2)reviews related work on actuarial multi\-state modeling, transition probability estimation, machine learning for longitudinal health risk, and calibration\. Section[3](https://arxiv.org/html/2606.13880#S3)presents the classical actuarial projection framework\. Section[4](https://arxiv.org/html/2606.13880#S4)introduces the transition estimation problem and the proposed methodology\. Section[5](https://arxiv.org/html/2606.13880#S5)describes the data and evaluation design\. Section[6](https://arxiv.org/html/2606.13880#S6)reports the empirical and actuarial projection results\. Section[7](https://arxiv.org/html/2606.13880#S7)concludes\.

## 2Related Work

### 2\.1Actuarial Multi\-State Models for Long\-Term Care

Actuarial modeling of long\-term care and disability risk is commonly based on multi\-state models in which individuals move among a finite set of functional states\. A standard specification uses states such as healthy, mild disability, severe disability, and death, with death treated as absorbing\. This structure allows for deterioration, partial recovery, and mortality, and provides a natural basis for disability transition tables, dependence probability tables, LTC insurance products valuation, and care\-annuity modeling\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[12](https://arxiv.org/html/2606.13880#bib.bib12),[34](https://arxiv.org/html/2606.13880#bib.bib38),[38](https://arxiv.org/html/2606.13880#bib.bib3)\]\.

In continuous time, these models are often formulated through transition intensities between health states\. In discrete\-time actuarial applications, the transition structure is represented using age\-specific transition probability matrices that are iterated to project future state occupancy\. This projection structure links statistical transition estimation directly to expected benefit payments, reserves, and solvency assessment\. As a result, the accuracy of transition probabilities is central to LTC insurance products valuation, particularly over long projection horizons where errors may accumulate through repeated matrix multiplication\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[5](https://arxiv.org/html/2606.13880#bib.bib50),[30](https://arxiv.org/html/2606.13880#bib.bib49)\]\.

Multi\-state LTC models have been used to quantify expected care time, healthy life expectancy, disability prevalence, and the financial consequences of disability progression\. Extensions of this framework have incorporated trend, parameter uncertainty, information delays, and longevity\-related risks in the analysis of disability insurance reserving, LTC financing, and related insurance products\[[26](https://arxiv.org/html/2606.13880#bib.bib46),[38](https://arxiv.org/html/2606.13880#bib.bib3),[37](https://arxiv.org/html/2606.13880#bib.bib41)\]\. When longitudinal transition histories are unavailable, transition probabilities may be inferred from repeated cross\-sectional prevalence and mortality data by imposing a Markov transition structure and estimating transition rates that are consistent with observed age\-specific prevalence and mortality patterns\[[12](https://arxiv.org/html/2606.13880#bib.bib12),[24](https://arxiv.org/html/2606.13880#bib.bib5)\]\.

The above establishes the actuarial importance of multi\-state transition probabilities and provides the projection framework used in LTC insurance products valuation\. However, many practical implementations rely on Markov, semi\-Markov, or low\-dimensional parametric specifications\. These assumptions may be restrictive when disability progression depends on accumulated health history, heterogeneous covariate effects, and irregular observation intervals\.

### 2\.2Estimating Transition Probabilities

The estimation of transition probabilities is central to actuarial multi\-state models for LTC insurance because these probabilities determine projected occupancy in healthy, disabled, and death states\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[34](https://arxiv.org/html/2606.13880#bib.bib38),[38](https://arxiv.org/html/2606.13880#bib.bib3)\]\. In discrete\-time valuation, transition probability matrices are iterated across future ages, so errors in estimated transition probabilities can affect projected disability prevalence, expected benefit payments, reserves, and solvency measures\[[5](https://arxiv.org/html/2606.13880#bib.bib50),[30](https://arxiv.org/html/2606.13880#bib.bib49)\]\. Related issues also arise in health economic Markov cohort models, where transition probability matrices represent movement among disease or care states and where formal guidance on transition probability estimation remains limited\[[32](https://arxiv.org/html/2606.13880#bib.bib15)\]\. This subsection reviews classical, GLM\-based, and machine\-learning approaches to transition probability estimation, with emphasis on their relevance for irregular longitudinal LTC data\.

#### Classical and Survival\-Based Estimation

Classical approaches estimate transition probabilities through parametric, non\-parametric, or semi\-parametric multi\-state models\. In survival analysis, transition dynamics are commonly represented through transition intensities, with estimation based on counting process theory, martingale methods, partial likelihood, Nelson\-Aalen\-type estimators, and the Aalen\-Johansen estimator\[[1](https://arxiv.org/html/2606.13880#bib.bib28),[20](https://arxiv.org/html/2606.13880#bib.bib42),[25](https://arxiv.org/html/2606.13880#bib.bib29),[21](https://arxiv.org/html/2606.13880#bib.bib30)\]\. These methods provide a rigorous inferential foundation and have been extended to competing risks, illness\-death models, time\-dependent covariates, semi\-Markov settings, and non\-Markov transition probability estimation\[[8](https://arxiv.org/html/2606.13880#bib.bib51),[14](https://arxiv.org/html/2606.13880#bib.bib52),[39](https://arxiv.org/html/2606.13880#bib.bib2),[29](https://arxiv.org/html/2606.13880#bib.bib14)\]\.

For Markov processes, the Aalen–Johansen estimator is widely used for estimating transition and state occupation probabilities\. However, when the Markov assumption is violated, conditioning only on the current state may not adequately capture accumulated frailty, prior disability episodes, duration in a health state, or other aspects of an individual’s health history\[[8](https://arxiv.org/html/2606.13880#bib.bib51),[14](https://arxiv.org/html/2606.13880#bib.bib52),[6](https://arxiv.org/html/2606.13880#bib.bib25)\]\. Non\-Markov, semi\-Markov, duration\-dependent, and landmarking approaches partially address this issue by incorporating elapsed time, sojourn time, or intermediate states at fixed prediction horizons\[[39](https://arxiv.org/html/2606.13880#bib.bib2)\]\. Nevertheless, these approaches typically require explicit specification of the relevant history summaries, hazard structure, and functional form\.

Although survival\-based methods provide principled inference for transition intensities, cumulative incidence, and state occupation probabilities, their practical use in LTC transition prediction requires explicit modeling choices about covariates, time scales, history summaries, and functional forms\. In this paper, high\-dimensional covariates refers to the many predictors available at each person\-wave observation, including ADL components, chronic condition indicators, self\-reported health, demographic variables, missingness indicators, and elapsed\-time variables\. A history summary refers to a compact representation of prior health experience, such as current state, duration in state, previous disability episodes, cumulative disease burden, or a learned recurrent memory vector\. Non\-Markov and landmark approaches address some limitations of first\-order Markov modeling by estimating transition probabilities without conditioning only on the current observed state\[[8](https://arxiv.org/html/2606.13880#bib.bib51),[14](https://arxiv.org/html/2606.13880#bib.bib52),[39](https://arxiv.org/html/2606.13880#bib.bib2),[29](https://arxiv.org/html/2606.13880#bib.bib14)\]\. In the present LTC setting, the remaining modeling problem is to learn how observed health history, covariates, and elapsed time jointly affect the next observed health state\.

#### GLM\-Based Estimation

Generalized linear models \(GLMs\) provide a natural bridge between classical multi\-state models and more flexible data\-driven approaches for estimating transition probabilities\. In actuarial and health\-state applications, GLM\-based specifications have been used to model disability and health transitions while incorporating covariates such as age, sex, duration, time trends, interaction effects, demographic information, and health\-related risk factors\[[12](https://arxiv.org/html/2606.13880#bib.bib12),[17](https://arxiv.org/html/2606.13880#bib.bib55),[27](https://arxiv.org/html/2606.13880#bib.bib6),[41](https://arxiv.org/html/2606.13880#bib.bib7),[7](https://arxiv.org/html/2606.13880#bib.bib40)\]\. These models retain an interpretable probabilistic structure and can be connected to transition probabilities either directly in discrete\-time formulations or indirectly through transition intensities and Kolmogorov forward equations in continuous\-time formulations\[[5](https://arxiv.org/html/2606.13880#bib.bib50),[32](https://arxiv.org/html/2606.13880#bib.bib15)\]\.

In discrete\-time settings, transitions from an origin state to possible destination states can be modeled using logistic, multinomial logistic, ordinal, proportional\-odds, or complementary log\-log specifications, depending on the structure of the state space and the transition outcome\[[32](https://arxiv.org/html/2606.13880#bib.bib15),[21](https://arxiv.org/html/2606.13880#bib.bib30),[25](https://arxiv.org/html/2606.13880#bib.bib29)\]\. For example, a multinomial specification can estimate origin\-state\-specific probabilities for transitions from one state to another, while ensuring that the probabilities across destination states sum to one\. This makes GLM\-based transition probabilities interpretable and suitable for assembly into transition matrices used in cohort projection\.

However, GLM\-based specifications typically require the analyst to pre\-specify the relevant covariates, interaction terms, time effects, and history of health patterns\. In many actuarial applications, this leads to relatively structured covariate specifications, piecewise\-constant hazards, or regular time grids, where transitions are evaluated at fixed intervals such as annual ages, policy years, or survey waves\. These choices support interpretability and tractability but may limit flexibility\[[12](https://arxiv.org/html/2606.13880#bib.bib12),[17](https://arxiv.org/html/2606.13880#bib.bib55),[27](https://arxiv.org/html/2606.13880#bib.bib6),[7](https://arxiv.org/html/2606.13880#bib.bib40)\]\. Frailty or random\-effect formulations can partially account for unobserved heterogeneity, that is, latent individual\- or group\-level risk variation not explained by observed covariates\[[44](https://arxiv.org/html/2606.13880#bib.bib32)\]\. In LTC applications, such heterogeneity may reflect factors such as underlying health vulnerability, care access, behavioral risk, or unmeasured comorbidity burden\. However, these formulations still require distributional assumptions and are not primarily designed to learn evolving trajectory\-level memory from irregular longitudinal observations\. These restrictions can be limiting in LTC settings, where disability progression may depend on nonlinear aging patterns, accumulated functional decline, recurrent disability episodes, and irregular spacing between observations\.

#### Machine Learning\-Based Health\-State Modeling

These limitations have motivated a growing interest in machine learning methods for modeling health\-state transitions and longitudinal disease progression\. A directly related actuarial contribution is\[[43](https://arxiv.org/html/2606.13880#bib.bib9)\], who combine neural networks with a generalized linear model to estimate and predict health transition intensities\. Their model incorporates socioeconomic and lifestyle factors and allows both linear and nonlinear relationships between these variables and transition intensities\. The present study differs by estimating the next\-observed state transition probability vectors from irregular longitudinal survey histories and then aggregating those probabilities into actuarial transition matrices\.

Related developments in multi\-state survival analysis have also used neural networks to estimate subject\-specific multi\-state quantities\. For example,\[[35](https://arxiv.org/html/2606.13880#bib.bib54)\]propose pseudo\-value\-based deep neural networks for multi\-state survival analysis, with the objective of predicting quantities such as transition probabilities and state occupation probabilities in the presence of censoring\. Other disease progression models, including continuous\-time hidden Markov models, address irregularly observed clinical trajectories by modeling latent disease states and transitions over continuous time\[[28](https://arxiv.org/html/2606.13880#bib.bib56)\]\. These approaches connect flexible learning methods with multi\-state disease progression, but they are generally developed for survival or latent\-state disease modeling rather than for producing observed\-state transition matrices for actuarial LTC projection\.

In broader healthcare applications, machine learning and deep learning methods have been used for longitudinal clinical prediction from electronic health records and intensive\-care time series\. For example,\[[36](https://arxiv.org/html/2606.13880#bib.bib16)\]use deep learning on raw electronic health records to predict outcomes such as in\-hospital mortality, 30\-day unplanned readmission, prolonged length of stay, and discharge diagnoses\.\[[18](https://arxiv.org/html/2606.13880#bib.bib17)\]develop clinical time\-series benchmarks covering mortality prediction, physiologic decompensation, length\-of\-stay forecasting, and phenotype classification\. Medical state\-transition forecasting has also been studied in short\-term clinical monitoring settings, including early circulatory failure detection using logistic regression, AdaBoost, and XGBoost\[[31](https://arxiv.org/html/2606.13880#bib.bib8)\]\. These studies demonstrate the ability of flexible models to use complex health histories for prediction, but they do not directly address the actuarial problem of producing calibrated multi\-state transition probability vectors that can be aggregated into LTC projection matrices\.

For actuarial LTC applications, predictive flexibility alone is insufficient\. Estimated outputs must remain interpretable as transition probabilities, be calibrated over all health states, and be suitable for aggregation into age\- and origin\-state\-specific transition matrices\. The relevant methodological gap is therefore the limited integration of flexible machine learning methods with the actuarial transition probability framework used in LTC insurance products projection and valuation\.

### 2\.3Calibration of Probabilistic Predictions

Calibration refers to the agreement between predicted probabilities and observed event frequencies\. This matters in LTC transition modeling because predicted probabilities are later aggregated into transition matrices and used in cohort projection\. If the probabilities are systematically too high or too low, projected state occupancy and valuation outputs can be biased\[[40](https://arxiv.org/html/2606.13880#bib.bib23),[42](https://arxiv.org/html/2606.13880#bib.bib21),[11](https://arxiv.org/html/2606.13880#bib.bib22)\]\.

Modern machine learning models present additional calibration challenges because high classification accuracy does not necessarily imply well\-calibrated probabilities\.\[[15](https://arxiv.org/html/2606.13880#bib.bib13)\]show that modern neural networks can improve classification accuracy while producing miscalibrated probability estimates, and they evaluate post\-processing calibration methods such as temperature scaling\. Validation should therefore include measures that assess probability accuracy rather than ranking performance alone\. Common tools include calibration curves, calibration\-in\-the\-large, calibration slope, the Brier score, Expected Calibration Error, and related proper scoring rules\[[2](https://arxiv.org/html/2606.13880#bib.bib36),[40](https://arxiv.org/html/2606.13880#bib.bib23),[15](https://arxiv.org/html/2606.13880#bib.bib13)\]\. In multi\-state settings, calibration is especially important because the model must assign reliable probabilities across several possible destination states, not only for a single binary endpoint\.

In actuarial applications, calibration should also be assessed in relation to the downstream use of the model\. Recent actuarial work has emphasized calibration and auto\-calibration as important properties of predictive models used in insurance applications\[[9](https://arxiv.org/html/2606.13880#bib.bib53)\]\. Accordingly, an estimator for LTC transition modeling should be evaluated not only by discrimination, but also by probabilistic accuracy, calibration, and transition\-matrix fidelity\.

Taken together, the existing literature provides powerful tools for modeling disability dynamics and health\-state transitions, but important practical challenges remain for irregular longitudinal LTC data\. Classical actuarial multi\-state models support pricing and solvency analysis, but are often implemented through Markov, semi\-Markov, or parametric transition structures\. Survival\-based and non\-Markov methods provide principled inference for transition probabilities and state occupation probabilities, but their application requires explicit choices about time scales, covariate effects, and the aspects of prior history to condition on\. GLM\-based models retain interpretability, but typically require manual specification of nonlinear effects and history dependence\. Machine learning methods provide flexible prediction, but existing applications are usually not designed to produce calibrated probability distributions over possible next LTC health states that can be aggregated into actuarial transition matrices\. This gap motivates structured machine learning estimators that produce valid multi\-state transition probability vectors while allowing richer dependence on covariates, the time elapsed between observations, and prior health history\.

## 3Classical Actuarial Multi\-State Framework

The estimated transition probabilities in this study are intended for use in standard actuarial multi\-state projection frameworks\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[34](https://arxiv.org/html/2606.13880#bib.bib38)\]\. We briefly review this framework and define the transition probability notation used throughout the analysis\.

Consider a finite state space𝒥=\{H,M,S,D\}\\mathcal\{J\}=\\\{H,M,S,D\\\}, whereHH,MM, andSSdenote healthy, mild disability, and severe disability states, respectively, andDDdenotes death as an absorbing state\. LetY\(a\)Y\(a\)represent an individual’s health state at ageaa\. In discrete\-time actuarial implementations with annual age steps, transition probability matrices𝐏a\\mathbf\{P\}\_\{a\}are estimated for each ageaa\. The one\-step transition probabilities are defined by Eqn\. \([1](https://arxiv.org/html/2606.13880#S3.E1)\)

prs\(a\)=Pr⁡\{Y\(a\+1\)=s∣Y\(a\)=r\},r,s∈𝒥\.p\_\{rs\}\(a\)=\\Pr\\\{Y\(a\+1\)=s\\mid Y\(a\)=r\\\},\\qquad r,s\\in\\mathcal\{J\}\.\(1\)
whererrandssdenote the origin and destination states in𝒥\\mathcal\{J\}, respectively\.

Figure[1](https://arxiv.org/html/2606.13880#S3.F1)illustrates the four\-state LTC transition structure at ageaa

HHMMSSDDpHM\(a\)p\_\{HM\}\(a\)pMH\(a\)p\_\{MH\}\(a\)pMS\(a\)p\_\{MS\}\(a\)pSD\(a\)p\_\{SD\}\(a\)pDD\(a\)=1p\_\{DD\}\(a\)=1Figure 1:Four\-state LTC transition structure\.StatesHH,MM,SS, andDDdenote healthy, mild disability, severe disability, and death, respectively\. Arrows represent age\-specific transition probabilitiesprs\(a\)p\_\{rs\}\(a\); selected transitions are labeled for illustration\. Death is absorbing, withpDD\(a\)=1p\_\{DD\}\(a\)=1\.The corresponding transition matrix is𝐏a=\(prs\(a\)\)r,s∈𝒥\\mathbf\{P\}\_\{a\}=\(p\_\{rs\}\(a\)\)\_\{r,s\\in\\mathcal\{J\}\}\. Each row of𝐏a\\mathbf\{P\}\_\{a\}is a probability vector, soprs\(a\)≥0p\_\{rs\}\(a\)\\geq 0and∑s∈𝒥prs\(a\)=1\\sum\_\{s\\in\\mathcal\{J\}\}p\_\{rs\}\(a\)=1\. Since death is absorbing,pDD\(a\)=1p\_\{DD\}\(a\)=1andpDs\(a\)=0p\_\{Ds\}\(a\)=0fors≠Ds\\neq D\.

Let𝝅a\\boldsymbol\{\\pi\}\_\{a\}denote the row vector of state occupancy probabilities at ageaa\. Cohort projection follows

𝝅a\+1=𝝅a𝐏a\.\\boldsymbol\{\\pi\}\_\{a\+1\}=\\boldsymbol\{\\pi\}\_\{a\}\\mathbf\{P\}\_\{a\}\.\(2\)\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[34](https://arxiv.org/html/2606.13880#bib.bib38)\]\. Hence, fora\>a0a\>a\_\{0\}, the occupancy vector at ageaacan be written as

𝝅a=𝝅a0𝐏a0𝐏a0\+1⋯𝐏a−1\.\\boldsymbol\{\\pi\}\_\{a\}=\\boldsymbol\{\\pi\}\_\{a\_\{0\}\}\\mathbf\{P\}\_\{a\_\{0\}\}\\mathbf\{P\}\_\{a\_\{0\}\+1\}\\cdots\\mathbf\{P\}\_\{a\-1\}\.The Expected Present Value \(EPV\)\[[10](https://arxiv.org/html/2606.13880#bib.bib43)\]of LTC benefits can then be expressed as

EPV=∑a=a0amax\(11\+R\)a−a0𝝅a𝐛=∑a=a0amax\(11\+R\)a−a0𝝅a0𝐏a0𝐏a0\+1⋯𝐏a−1𝐛\.\\mathrm\{EPV\}=\\sum\_\{a=a\_\{0\}\}^\{a\_\{\\max\}\}\\left\(\\frac\{1\}\{1\+R\}\\right\)^\{a\-a\_\{0\}\}\\boldsymbol\{\\pi\}\_\{a\}\\mathbf\{b\}=\\sum\_\{a=a\_\{0\}\}^\{a\_\{\\max\}\}\\left\(\\frac\{1\}\{1\+R\}\\right\)^\{a\-a\_\{0\}\}\\boldsymbol\{\\pi\}\_\{a\_\{0\}\}\\mathbf\{P\}\_\{a\_\{0\}\}\\mathbf\{P\}\_\{a\_\{0\}\+1\}\\cdots\\mathbf\{P\}\_\{a\-1\}\\mathbf\{b\}\.\(3\)where𝐛\\mathbf\{b\}denotes the vector of state\-contingent benefits,RRis the annual discount rate,a0a\_\{0\}is the initial projection age, andamaxa\_\{\\max\}is the terminal projection age\. Fora=a0a=a\_\{0\}, the empty product of transition matrices is interpreted as the identity matrix\.

Errors in estimated transition probabilities propagate through successive projection steps due to repeated matrix multiplication\. If𝐏^a=𝐏a\+Δ𝐏a\\widehat\{\\mathbf\{P\}\}\_\{a\}=\\mathbf\{P\}\_\{a\}\+\\Delta\\mathbf\{P\}\_\{a\}, then projected state probabilities may accumulate deviations over time, potentially inducing material bias in projected disability prevalence, expected benefit costs, reserves, and related valuation quantities\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[5](https://arxiv.org/html/2606.13880#bib.bib50),[30](https://arxiv.org/html/2606.13880#bib.bib49)\]\.

Classical implementations assume either first\-order Markov dependence on observed states, semi\-Markov dependence on duration in state, or parametric covariate effects within proportional hazard structures\[[16](https://arxiv.org/html/2606.13880#bib.bib37),[27](https://arxiv.org/html/2606.13880#bib.bib6),[39](https://arxiv.org/html/2606.13880#bib.bib2)\]\. These assumptions motivate the development of more flexible estimators capable of accommodating nonlinear interactions, irregular observation intervals, and history dependence while preserving the row\-stochastic transition probability structure required for actuarial projection\. Thus, the proposed method estimates the transition probabilities used to construct𝐏a\\mathbf\{P\}\_\{a\}, while leaving the classical cohort projection framework unchanged\.

## 4Methods

### 4\.1Problem Formulation

The classical projection framework in Section[3](https://arxiv.org/html/2606.13880#S3)is written in terms of age\-specific transition probabilitiesprs\(a\)p\_\{rs\}\(a\)\. In the longitudinal data used here, individuals are observed at irregular ages rather than at fixed annual ages\. Following the terminology introduced in Section[1](https://arxiv.org/html/2606.13880#S1), a visit denotes an observed HRS person\-wave record for an individual\.

We study longitudinal disability progression and mortality risk within a cohort of individuals observed over time\. LetNNdenote the number of individuals indexed byi∈\{1,…,N\}i\\in\\\{1,\\dots,N\\\}\. Each individualiiis observed at a sequence of visitsv=1,…,Viv=1,\\dots,V\_\{i\}, ordered by survey wave and occurring at ages

ai\(1\)<ai\(2\)<⋯<ai\(Vi\),a\_\{i\}^\{\(1\)\}<a\_\{i\}^\{\(2\)\}<\\cdots<a\_\{i\}^\{\(V\_\{i\}\)\},whereViV\_\{i\}is the number of observed visits for individualii\. In the implementation, visits are processed in survey\-wave order, while age is used to measure irregular elapsed time between observed visits\.

At each visitvvfor individualiiagedai\(v\)a\_\{i\}^\{\(v\)\}, we observe a time\-varying feature vector𝐱i\(v\)∈ℝdx\\mathbf\{x\}\_\{i\}^\{\(v\)\}\\in\\mathbb\{R\}^\{d\_\{x\}\}for that individual, wheredxd\_\{x\}is the number of observed time\-varying covariates\. These covariates represent clinical conditions, comorbidity, ADL information, health\-related risk factors etc\. In addition, each individual\-visit record contains a vector of demographic and socioeconomic attributes𝐜i\(v\)=\(ci1\(v\),…,ciA\(v\)\)\\mathbf\{c\}\_\{i\}^\{\(v\)\}=\(c\_\{i1\}^\{\(v\)\},\\dots,c\_\{iA\}^\{\(v\)\}\), whereAAis the number of such attributes\. These attributes include variables such as sex, race, Hispanic ethnicity, education, marital status, and region\.

LetYi\(v\)∈𝒥Y\_\{i\}^\{\(v\)\}\\in\\mathcal\{J\}denote the observed current health state of individualiiat visitvv\. For each transition interval\(v,v\+1\)\(v,v\+1\), the prediction target is the next retained observed stateYi\(v\+1\)Y\_\{i\}^\{\(v\+1\)\}, including death when the subsequent retained state is assigned toDDusing recorded age at death\. AlthoughYi\(v\)Y\_\{i\}^\{\(v\)\}is used to define transition intervals and to aggregate predictions into origin\-state\-specific actuarial transition matrices, it is not supplied as a separate input feature to our model\. This is because it is deterministically derived from ADL variables already contained in𝐱i\(v\)\\mathbf\{x\}\_\{i\}^\{\(v\)\}\. This avoids duplicating state information while allowing the recurrent memory to learn disability history from the underlying ADL profile and related health covariates\.

Since observation intervals vary across visits and individuals, the model explicitly encodes the elapsed age since the previous observed visit\. Forv≥2v\\geq 2, define

Δai\(v\)=ai\(v\)−ai\(v−1\),v≥2,\\Delta a\_\{i\}^\{\(v\)\}=a\_\{i\}^\{\(v\)\}\-a\_\{i\}^\{\(v\-1\)\},\\qquad v\\geq 2,withΔai\(1\)=0\\Delta a\_\{i\}^\{\(1\)\}=0\. This quantity is known at visitvvand captures irregularity in the observed trajectory up to the prediction time\. Figure[2](https://arxiv.org/html/2606.13880#S4.F2)illustrates the resulting irregular visit sequence and the elapsed age increments used by the model\. The forward intervalai\(v\+1\)−ai\(v\)a\_\{i\}^\{\(v\+1\)\}\-a\_\{i\}^\{\(v\)\}is not used as an input feature because it is not available when predicting the next observed state\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x1.png)Figure 2:Irregular observed visit times\.Observed visits for individualiioccur at irregular ages\. The elapsed\-age increments shown above are the backward intervals available when processing each visit\.Define the observed time\-varying history of individualiiup to and includingvvas

ℋi\(v\)=\{\(ai\(ℓ\),𝐱i\(ℓ\)\):1≤ℓ≤v\}\.\\mathcal\{H\}\_\{i\}^\{\(v\)\}=\\left\\\{\(a\_\{i\}^\{\(\\ell\)\},\\mathbf\{x\}\_\{i\}^\{\(\\ell\)\}\):1\\leq\\ell\\leq v\\right\\\}\.To reiterate, the current observed stateYi\(v\)Y\_\{i\}^\{\(v\)\}is not separately included inℋi\(v\)\\mathcal\{H\}\_\{i\}^\{\(v\)\}because it is derived from ADL components of𝐱i\(v\)\\mathbf\{x\}\_\{i\}^\{\(v\)\}\.

Letθ\\thetadenote the collection of trainable model parameters\. The proposed estimator targets the conditional next\-observation transition distribution in Eqn\. \([4](https://arxiv.org/html/2606.13880#S4.E4)\)\.

pθ,is\(v\)=Prθ⁡\(Yi\(v\+1\)=s∣ℋi\(v\),Δai\(v\),𝐜i\(v\)\),s∈𝒥\.p\_\{\\theta,is\}^\{\(v\)\}=\\Pr\_\{\\theta\}\\\!\\left\(Y\_\{i\}^\{\(v\+1\)\}=s\\mid\\mathcal\{H\}\_\{i\}^\{\(v\)\},\\Delta a\_\{i\}^\{\(v\)\},\\mathbf\{c\}\_\{i\}^\{\(v\)\}\\right\),\\qquad s\\in\\mathcal\{J\}\.\(4\)These probabilities are next\-observation probabilities: they refer to the next retained observed state after visitvv\. The backward elapsed\-age incrementΔai\(v\)\\Delta a\_\{i\}^\{\(v\)\}is used as an input feature to represent the recent spacing of the observed trajectory; the forward intervalai\(v\+1\)−ai\(v\)a\_\{i\}^\{\(v\+1\)\}\-a\_\{i\}^\{\(v\)\}is not used as an input because it is not known at prediction time\. For each individual\-visit observation, the learned mapfθf\_\{\\theta\}estimates this distribution by taking the observed history, age increment, and individual attributes as inputs and produces the predicted probability vector

𝐩^i\(v\)=fθ\(ℋi\(v\),Δai\(v\),𝐜i\(v\)\)=\(p^iH\(v\),p^iM\(v\),p^iS\(v\),p^iD\(v\)\),\\widehat\{\\mathbf\{p\}\}\_\{i\}^\{\(v\)\}=f\_\{\\theta\}\\\!\\left\(\\mathcal\{H\}\_\{i\}^\{\(v\)\},\\Delta a\_\{i\}^\{\(v\)\},\\mathbf\{c\}\_\{i\}^\{\(v\)\}\\right\)=\\left\(\\widehat\{p\}\_\{iH\}^\{\(v\)\},\\widehat\{p\}\_\{iM\}^\{\(v\)\},\\widehat\{p\}\_\{iS\}^\{\(v\)\},\\widehat\{p\}\_\{iD\}^\{\(v\)\}\\right\),\(5\)wherep^is\(v\)\\widehat\{p\}\_\{is\}^\{\(v\)\}estimatespθ,is\(v\)p\_\{\\theta,is\}^\{\(v\)\}\. The output is required to be a valid probability vector:

p^is\(v\)≥0,∑s∈𝒥p^is\(v\)=1\.\\widehat\{p\}\_\{is\}^\{\(v\)\}\\geq 0,\\qquad\\sum\_\{s\\in\\mathcal\{J\}\}\\widehat\{p\}\_\{is\}^\{\(v\)\}=1\.Here,p^is\(v\)\\widehat\{p\}\_\{is\}^\{\(v\)\}denotes the individual\-level predicted probability that individualiioccupies destination statessat the next observed visit after visitvv\. These individual\-level probability vectors form the inputs to the aggregation step described in Section[4\.3](https://arxiv.org/html/2606.13880#S4.SS3)\.

### 4\.2LANTERNArchitecture

To model history\-dependent transition probabilities under irregular observation times, we proposeLANTERN, a latent\-trajectory framework that integrates recurrent memory, time\-aware embeddings, and adaptive demographic conditioning\. The architecture is designed to estimate valid individual\-level transition probability vectors𝐩^i\(v\)\\widehat\{\\mathbf\{p\}\}\_\{i\}^\{\(v\)\}that can be aggregated into the actuarial transition matrices𝐏^ag\\widehat\{\\mathbf\{P\}\}\_\{ag\}used in cohort projection\.

#### Latent Trajectory Representation

For each individualii, the proposed model represents evolving health status through a latent memory vector𝐦i\(v\)∈ℝdh\\mathbf\{m\}\_\{i\}^\{\(v\)\}\\in\\mathbb\{R\}^\{d\_\{h\}\}, wheredhd\_\{h\}is the latent memory dimension\. This vector summarizes the individual’s observed trajectory through visitvv\. The latent memory is updated sequentially across visits and serves as a compact representation of accumulated health history\. Abstractly, the memory evolves according to

𝐦i\(v\)=Ψθ\(𝐦i\(v−1\),𝐱i\(v\),𝐜i\(v\),Δai\(v\)\),𝐦i\(0\)=𝟎,\\mathbf\{m\}\_\{i\}^\{\(v\)\}=\\Psi\_\{\\theta\}\\\!\\left\(\\mathbf\{m\}\_\{i\}^\{\(v\-1\)\},\\mathbf\{x\}\_\{i\}^\{\(v\)\},\\mathbf\{c\}\_\{i\}^\{\(v\)\},\\Delta a\_\{i\}^\{\(v\)\}\\right\),\\qquad\\mathbf\{m\}\_\{i\}^\{\(0\)\}=\\mathbf\{0\},\(6\)whereΨθ\\Psi\_\{\\theta\}denotes a learnable trajectory update function\. In the implementation below, this update is instantiated using a gated recurrent unit that receives the time\-varying covariates, the adaptive attribute\-conditioning vector, and the elapsed\-time embedding\.

This recursive formulation allows transition probabilities to depend on a learned summary of the longitudinal trajectory rather than only on the most recent observed health state\. Figure[3](https://arxiv.org/html/2606.13880#S4.F3)illustrates this history\-dependent representation\. Consequently, the model relaxes the first\-order Markov assumption on the observed health state process while retaining a Markov structure in latent space\. The latent memory is intended to capture duration effects, recurrent disability patterns, recovery patterns, and accumulated frailty information that may not be fully represented by the current observed state alone\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x2.png)Figure 3:History dependence inLANTERN\.Unlike a first\-order Markov model, which conditions on the current observed state, our model summarizes the observed trajectory through visitvvin latent memory before estimating next\-state transition probability vector𝐩^i\(v\)\\mathbf\{\\widehat\{p\}\}\_\{i\}^\{\(v\)\}overH,M,S,DH,M,S,D\.
#### Time Encoding for Irregular Visit Intervals

As discussed in Section[4\.1](https://arxiv.org/html/2606.13880#S4.SS1), transition probabilities depend on the elapsed time between consecutive observations\. In this study, time is parameterized using individual age, which serves as the natural biological time scale for disability progression and mortality risk in long\-term care insurance\. Accordingly, elapsed time reflects the age increment between visits and captures both biological aging and irregular follow\-up intervals\.

To model nonlinear time\-dependent risk, elapsed time is first transformed using the logarithmic mapping

Δ~ai\(v\)=log⁡\(1\+Δai\(v\)\),\\widetilde\{\\Delta\}a\_\{i\}^\{\(v\)\}=\\log\\\!\\left\(1\+\\Delta a\_\{i\}^\{\(v\)\}\\right\),which stabilizes variation across observation intervals while preserving relative differences in elapsed age\. The transformed elapsed age is then embedded using a learnable Time2Vec mapping\[[22](https://arxiv.org/html/2606.13880#bib.bib26)\],

𝝉i\(v\)=\(w0Δa~i\(v\)\+b0,sin⁡\(w1Δa~i\(v\)\+b1\),…,sin⁡\(wdt−1Δa~i\(v\)\+bdt−1\)\)∈ℝdt\.\\boldsymbol\{\\tau\}\_\{i\}^\{\(v\)\}=\\left\(w\_\{0\}\\widetilde\{\\Delta a\}\_\{i\}^\{\(v\)\}\+b\_\{0\},\\sin\(w\_\{1\}\\widetilde\{\\Delta a\}\_\{i\}^\{\(v\)\}\+b\_\{1\}\),\\dots,\\sin\(w\_\{d\_\{t\}\-1\}\\widetilde\{\\Delta a\}\_\{i\}^\{\(v\)\}\+b\_\{d\_\{t\}\-1\}\)\\right\)\\in\\mathbb\{R\}^\{d\_\{t\}\}\.\(7\)wheredtd\_\{t\}is the elapsed\-time embedding dimension, and\{wℓ,bℓ\}ℓ=0dt−1\\\{w\_\{\\ell\},b\_\{\\ell\}\\\}\_\{\\ell=0\}^\{d\_\{t\}\-1\}are learnable Time2Vec parameters\. The first component captures monotonic effects of elapsed time, while the remainingdt−1d\_\{t\}\-1components are sinusoidal functions with learnable frequencies and phases, enabling flexible modeling of nonlinear time\-dependent risk\. Figure[4](https://arxiv.org/html/2606.13880#S4.F4)summarizes the elapsed\-time transformation and embedding used to represent irregular visit intervals\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/Time_embedding.jpg)Figure 4:Elapsed\-time encoding inLANTERN\.The backward elapsed\-age increment is transformed and embedded before entering the recurrent update and attribute\-attention query\.We use a Time2Vec mapping because it provides a learnable representation of elapsed age that can capture both monotone and nonlinear time\-spacing effects without imposing a fixed parametric functional form\.

To further account for visit\-process dynamics, the numerical covariate vector is augmented with elapsed\-time information, a first\-visit indicator, and cumulative visit count\. This allows the model to distinguish early and later phases of an individual’s longitudinal trajectory\.

#### Adaptive Attribute Conditioning

Demographic and socioeconomic factors may influence disability progression and mortality risk, but their effects can vary across individual trajectories and stages of decline\. To capture this heterogeneity, our model incorporates individual attributes through an adaptive attention mechanism\.

Let𝒜i=\{𝐞ij∈ℝdh:j=1,…,A\}\\mathcal\{A\}\_\{i\}=\\\{\\mathbf\{e\}\_\{ij\}\\in\\mathbb\{R\}^\{d\_\{h\}\}:j=1,\\dots,A\\\}denote the set of embedded attribute representations for individualii\. Each categorical attribute is embedded into a shared latent space\. At visitvv, the model constructs an attention query

𝐪i\(v\)=hθ\(𝐦i\(v−1\),𝝉i\(v\)\),\\mathbf\{q\}\_\{i\}^\{\(v\)\}=h\_\{\\theta\}\\\!\\left\(\\mathbf\{m\}\_\{i\}^\{\(v\-1\)\},\\boldsymbol\{\\tau\}\_\{i\}^\{\(v\)\}\\right\),\(8\)where𝐦i\(v−1\)\\mathbf\{m\}\_\{i\}^\{\(v\-1\)\}is the previous latent memory vector,𝝉i\(v\)\\boldsymbol\{\\tau\}\_\{i\}^\{\(v\)\}is the elapsed\-time embedding, andhθh\_\{\\theta\}is a learnable mapping\. The attribute importance weights are computed as

αij\(v\)=exp⁡\(\(𝐪i\(v\)\)⊤𝐞ij\)∑j′=1Aexp⁡\(\(𝐪i\(v\)\)⊤𝐞ij′\),\\alpha\_\{ij\}^\{\(v\)\}=\\frac\{\\exp\\\!\\left\(\(\\mathbf\{q\}\_\{i\}^\{\(v\)\}\)^\{\\top\}\\mathbf\{e\}\_\{ij\}\\right\)\}\{\\sum\_\{j^\{\\prime\}=1\}^\{A\}\\exp\\\!\\left\(\(\\mathbf\{q\}\_\{i\}^\{\(v\)\}\)^\{\\top\}\\mathbf\{e\}\_\{ij^\{\\prime\}\}\\right\)\},\(9\)and the adaptive attribute\-conditioning vector is

𝐮i\(v\)=∑j=1Aαij\(v\)𝐞ij\.\\mathbf\{u\}\_\{i\}^\{\(v\)\}=\\sum\_\{j=1\}^\{A\}\\alpha\_\{ij\}^\{\(v\)\}\\mathbf\{e\}\_\{ij\}\.\(10\)
Figure[5](https://arxiv.org/html/2606.13880#S4.F5)illustrates how the query vector attends over demographic and socioeconomic attribute embeddings to produce the trajectory\-dependent attribute summary\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/AAC.jpg)Figure 5:Adaptive attribute conditioning inLANTERN\.The previous memory state𝐦i\(v−1\)\\mathbf\{m\}\_\{i\}^\{\(v\-1\)\}and elapsed\-time embedding𝝉i\(v\)\\boldsymbol\{\\tau\}\_\{i\}^\{\(v\)\}form a query vector𝐪i\(v\)\\mathbf\{q\}\_\{i\}^\{\(v\)\}, which attends over demographic and socioeconomic attribute embeddings\{𝐞ij\(v\)\}j=1A\\\{\\mathbf\{e\}\_\{ij\}^\{\(v\)\}\\\}\_\{j=1\}^\{A\}to produce the trajectory\-dependent attribute summary𝐮i\(v\)\\mathbf\{u\}\_\{i\}^\{\(v\)\}\.This formulation defines a structured interaction between the individual’s latent trajectory state and the attribute set𝒜i\\mathcal\{A\}\_\{i\}\. The resulting vector𝐮i\(v\)\\mathbf\{u\}\_\{i\}^\{\(v\)\}allows demographic and socioeconomic information to enter the transition model in a trajectory\-dependent way\. By restricting the interaction to an attention\-based attribute summary, the model captures heterogeneous demographic influences while avoiding an unstructured expansion of high\-dimensional interactions\.

#### Memory Update and Transition Modeling

At each visitvv, the latent memory is updated using a Gated Recurrent Unit \(GRU\)\[[4](https://arxiv.org/html/2606.13880#bib.bib57)\]\. A GRU is a gated recurrent update that controls how much prior memory is retained or overwritten when new visit information is observed\. Thus, it is given by

𝐦i\(v\)=GRU\(\[𝐱i\(v\),𝐮i\(v\),𝝉i\(v\)\],𝐦i\(v−1\)\)\.\\mathbf\{m\}\_\{i\}^\{\(v\)\}=\\mathrm\{GRU\}\\\!\\left\(\\left\[\\mathbf\{x\}\_\{i\}^\{\(v\)\},\\mathbf\{u\}\_\{i\}^\{\(v\)\},\\boldsymbol\{\\tau\}\_\{i\}^\{\(v\)\}\\right\],\\mathbf\{m\}\_\{i\}^\{\(v\-1\)\}\\right\)\.\(11\)The updated latent memory𝐦i\(v\)\\mathbf\{m\}\_\{i\}^\{\(v\)\}summarizes available trajectory information through visitvvand is used to predict the next observed stateYi\(v\+1\)Y\_\{i\}^\{\(v\+1\)\}\.

Given the memory vector𝐦i\(v\)\\mathbf\{m\}\_\{i\}^\{\(v\)\}, our model predicts the next\-visit transition distribution using a hierarchical factorization that separates mortality from transitions among living disability states\. Mortality risk is modeled as a Bernoulli component with logit

zD,i\(v\)=𝐰D⊤𝐦i\(v\)\+bD\.z\_\{D,i\}^\{\(v\)\}=\\mathbf\{w\}\_\{D\}^\{\\top\}\\mathbf\{m\}\_\{i\}^\{\(v\)\}\+b\_\{D\}\.Here,𝐰D\\mathbf\{w\}\_\{D\}andbDb\_\{D\}are trainable parameters of the mortality output head\. Hence, the predicted probability of death at the next observed visit is

p^iD\(v\)=σ\(zD,i\(v\)\),\\widehat\{p\}\_\{iD\}^\{\(v\)\}=\\sigma\\\!\\left\(z\_\{D,i\}^\{\(v\)\}\\right\),\(12\)whereσ\(⋅\)\\sigma\(\\cdot\)denotes the sigmoid function\.

Let𝒥alive=\{H,M,S\}⊂𝒥\\mathcal\{J\}\_\{\\rm alive\}=\\\{H,M,S\\\}\\subset\\mathcal\{J\}denote the set of non\-death states\. Conditional on survival, the transition distribution over𝒥alive\\mathcal\{J\}\_\{\\rm alive\}is modeled using a multinomial component with logit vector

𝐳alive,i\(v\)=𝐖alive𝐦i\(v\)\+𝐛alive∈ℝ3\\mathbf\{z\}\_\{\{\\rm alive\},i\}^\{\(v\)\}=\\mathbf\{W\}\_\{\\rm alive\}\\mathbf\{m\}\_\{i\}^\{\(v\)\}\+\\mathbf\{b\}\_\{\\rm alive\}\\in\\mathbb\{R\}^\{3\}where𝐖alive\\mathbf\{W\}\_\{\\rm alive\}and𝐛alive\\mathbf\{b\}\_\{\\rm alive\}are trainable parameters of the conditional alive\-state output head\. The conditional probability of alive states∈𝒥alives\\in\\mathcal\{J\}\_\{\\rm alive\}is

ρis\(v\)=exp⁡\(zalive,is\(v\)\)∑s′∈𝒥aliveexp⁡\(zalive,is′\(v\)\),s∈𝒥alive\.\\rho\_\{is\}^\{\(v\)\}=\\frac\{\\exp\\\!\\left\(z\_\{\{\\rm alive\},is\}^\{\(v\)\}\\right\)\}\{\\sum\_\{s^\{\\prime\}\\in\\mathcal\{J\}\_\{\\rm alive\}\}\\exp\\\!\\left\(z\_\{\{\\rm alive\},is^\{\\prime\}\}^\{\(v\)\}\\right\)\},\\qquad s\\in\\mathcal\{J\}\_\{\\rm alive\}\.Together with the predicted death probabilityp^iD\(v\)\\widehat\{p\}\_\{iD\}^\{\(v\)\}, the living\-state probabilities are

p^is\(v\)=\(1−p^iD\(v\)\)ρis\(v\),s∈𝒥alive\.\\widehat\{p\}\_\{is\}^\{\(v\)\}=\\left\(1\-\\widehat\{p\}\_\{iD\}^\{\(v\)\}\\right\)\\rho\_\{is\}^\{\(v\)\},\\qquad s\\in\\mathcal\{J\}\_\{\\rm alive\}\.\(13\)By construction,

∑s∈𝒥p^is\(v\)=1andp^is\(v\)≥0\\sum\_\{s\\in\\mathcal\{J\}\}\\widehat\{p\}\_\{is\}^\{\(v\)\}=1\\quad\\text\{and\}\\quad\\widehat\{p\}\_\{is\}^\{\(v\)\}\\geq 0for every individual\-visit observation\.

This hierarchical formulation explicitly separates mortality risk from disability severity while maintaining a coherent probability distribution over the four actuarial states\. It aligns with multi\-state disability modeling in which death is clinically and actuarially distinct from transitions among living functional states\. Death is enforced as absorbing when constructing actuarial projection matrices\.

### 4\.3Aggregation into Actuarial Transition Matrices

The output of our model is an individual\-level transition probability vector𝐩^i\(v\)\\widehat\{\\mathbf\{p\}\}\_\{i\}^\{\(v\)\}\. To use these predictions in the actuarial projection framework of Section[3](https://arxiv.org/html/2606.13880#S3), the individual\-level probabilities are aggregated into age\- and origin\-state\-specific transition matrices\.

Letℐag,r\\mathcal\{I\}\_\{ag,r\}denote the set of individual\-visit observations with current age in age groupagagand current observed staterr:

ℐag,r=\{\(i,v\):ai\(v\)∈ag,Yi\(v\)=r\}\.\\mathcal\{I\}\_\{ag,r\}=\\left\\\{\(i,v\):a\_\{i\}^\{\(v\)\}\\in ag,\\;Y\_\{i\}^\{\(v\)\}=r\\right\\\}\.For each origin stater∈𝒥r\\in\\mathcal\{J\}and destination states∈𝒥s\\in\\mathcal\{J\}, the aggregated transition probability is the average predicted probability of destination statessamong all individual\-visit observations in age groupagagwith origin staterrgiven by Eqn\. \([14](https://arxiv.org/html/2606.13880#S4.E14)\)

p^rs\(ag\)=1\|ℐag,r\|∑\(i,v\)∈ℐag,rp^is\(v\)\.\\widehat\{p\}\_\{rs\}\(ag\)=\\frac\{1\}\{\|\\mathcal\{I\}\_\{ag,r\}\|\}\\sum\_\{\(i,v\)\\in\\mathcal\{I\}\_\{ag,r\}\}\\widehat\{p\}\_\{is\}^\{\(v\)\}\.\(14\)Here,p^rs\(ag\)\\widehat\{p\}\_\{rs\}\(ag\)denotes the age\-group and origin\-state aggregated transition probability from origin staterrto destination statess\. The estimated actuarial transition matrix at age groupagagis then

𝐏^ag=\(p^rs\(ag\)\)r,s∈𝒥\.\\widehat\{\\mathbf\{P\}\}\_\{ag\}=\\left\(\\widehat\{p\}\_\{rs\}\(ag\)\\right\)\_\{r,s\\in\\mathcal\{J\}\}\.The absorbing death row is imposed in the projection matrix by setting

p^DD\(ag\)=1,p^Ds\(ag\)=0fors≠D\.\\widehat\{p\}\_\{DD\}\(ag\)=1,\\qquad\\widehat\{p\}\_\{Ds\}\(ag\)=0\\quad\\text\{for \}s\\neq D\.Thus,𝐏^ag\\widehat\{\\mathbf\{P\}\}\_\{ag\}has the same structure as the classical transition matrix𝐏ag\\mathbf\{P\}\_\{ag\}, but its entries are obtained by aggregating calibrated, history\-dependent individual predictions rather than by estimating transition probabilities only as functions of age and origin state\.

The resulting transition matrices can be inserted directly into the classical cohort recursion:

𝝅^agj\+1=𝝅^agj𝐏^agj\.\\widehat\{\\boldsymbol\{\\pi\}\}\_\{ag\_\{j\+1\}\}=\\widehat\{\\boldsymbol\{\\pi\}\}\_\{ag\_\{j\}\}\\widehat\{\\mathbf\{P\}\}\_\{ag\_\{j\}\}\.\(15\)Accordingly, the proposed method modifies the transition\-estimation stage while preserving the actuarial cohort projection framework\.

## 5Numerical Experiments

### 5\.1Data and Sample Construction

We use longitudinal data from the Health and Retirement Study \(HRS\)\[[3](https://arxiv.org/html/2606.13880#bib.bib27)\], a nationally representative biennial survey of U\.S\. adults conducted by the University of Michigan\. Our analysis is based on the RAND HRS Longitudinal File covering survey waves from 1998 through 2022, which provides harmonized health, demographic, functional status, and mortality variables across waves\. Although the HRS primarily targets adults aged 50 and older, spouses and younger household members are also interviewed; accordingly, we retain observations for individuals aged 30–100\.

Starting from the RAND respondent\-level file, we extract variables for age, functional limitations, chronic conditions, self\-reported health, cognition, depressive symptoms, body mass index, marital status, census region, mortality, and baseline demographics\. These variables are reshaped into a long person\-wave file, where each observed person\-wave corresponds to a visit in the model notation\. Age at interview is used as the primary time index, yielding irregular visit intervals consistent with the biennial HRS design\. Person\-waves with missing interview age are excluded, and the sample is restricted to ages 30–100\. Recorded age at death is used, when available, to identify death states\.

Functional status is summarized using the RAND six\-item Activities of Daily Living \(ADL\) index and the five\-item Instrumental Activities of Daily Living \(IADL\) index\. Major chronic conditions, including diabetes, cancer, lung disease, heart disease, stroke, and psychiatric conditions, are recoded as binary indicators and aggregated into a disease burden measure\. Additional time\-varying covariates include self\-reported health, marital status, census region, depressive symptoms, body mass index, and cognition\. Time\-invariant demographic variables, including sex, race, Hispanic ethnicity, and education, are merged at the individual level\. Missingness indicators are included for survey variables with incomplete responses\.

We define four functional health states: Healthy \(H\), Mild disability \(M\), Severe disability \(S\), and Death \(D\)\. Healthy status is assigned to alive person\-waves with no ADL limitations and no IADL limitations\. Mild disability is assigned to alive person\-waves with either exactly one ADL limitation or at least one IADL limitation in the absence of ADL limitations\. Severe disability is assigned to alive person\-waves with two or more ADL limitations\. Death is treated as a terminal state using recorded age at death\. For each person\-wave record, a death indicator is set equal to one when recorded age at death is available and the interview age is greater than or equal to the recorded age at death\. Such records are assigned to stateDD\. For an alive individualiiat visitvv,

H\\displaystyle H:ADLiv=0,IADLiv=0,\\displaystyle:\\mathrm\{ADL\}\_\{iv\}=0,\\ \\mathrm\{IADL\}\_\{iv\}=0,M\\displaystyle M:\(ADLiv=1\)or\(ADLiv=0,IADLiv≥1\),\\displaystyle:\(\\mathrm\{ADL\}\_\{iv\}=1\)\\quad\\text\{or\}\\quad\(\\mathrm\{ADL\}\_\{iv\}=0,\\ \\mathrm\{IADL\}\_\{iv\}\\geq 1\),S\\displaystyle S:ADLiv≥2\.\\displaystyle:\\mathrm\{ADL\}\_\{iv\}\\geq 2\.The next\-state target is then constructed by shifting the current state forward within each individual\. Thus,Yi\(v\+1\)=DY\_\{i\}^\{\(v\+1\)\}=Dindicates that the next retained person\-wave state is death\. Person\-wave records without a subsequent observed state are excluded from the transition sample\.

The final dataset contains 250,755 person\-wave transition records from 39,037 unique individuals\. Each record corresponds to an observed person\-wave with a defined current functional state and a subsequent observed next\-wave state\. The target distribution is highly imbalanced: 189,626 transitions, or 75\.6%, end in Healthy; 33,092, or 13\.2%, end in Mild disability; 24,985, or 10\.0%, end in Severe disability; and 3,052, or 1\.2%, end in Death\. This imbalance reflects the longitudinal structure of the HRS and the relative rarity of observed death transitions between adjacent interview waves\. Summary statistics for the transition sample are reported in Table[1](https://arxiv.org/html/2606.13880#S5.T1), and the transition target distribution and visit\-gap distribution are shown in Figures[6\(a\)](https://arxiv.org/html/2606.13880#S5.F6.sf1)and[6\(b\)](https://arxiv.org/html/2606.13880#S5.F6.sf2)\.

Table 1:Analytic sample and transition outcome distributionSample or outcomeCountPercentInitial RAND HRS respondents45,234–Individuals in analytic sample39,037–Person\-wave transition records250,755100\.0%Next state: Healthy189,62675\.6%Next state: Mild disability33,09213\.2%Next state: Severe disability24,98510\.0%Next state: Death3,0521\.2%![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/class_dist.png)\(a\)Distribution of next observed health states\.
![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/time_gaps.png)\(b\)Distribution of elapsed time between visits by next observed health state\.

Figure 6:Outcome distribution and visit timing in the longitudinal HRS transition dataset\.
### 5\.2Training Setup

Our model is trained by minimizing the negative conditional log\-likelihood induced by the hierarchical transition model described in Section[4\.2](https://arxiv.org/html/2606.13880#S4.SS2.SSS0.Px4)\. For each transition interval\(i,v\)\(i,v\), letyD,i\(v\)=𝟏\{Yi\(v\+1\)=D\}y\_\{D,i\}^\{\(v\)\}=\\mathbf\{1\}\\\{Y\_\{i\}^\{\(v\+1\)\}=D\\\}denote the death indicator, and letYi\(v\+1\)∈𝒥aliveY\_\{i\}^\{\(v\+1\)\}\\in\\mathcal\{J\}\_\{\\rm alive\}denote the alive\-state label for observations withYi\(v\+1\)≠DY\_\{i\}^\{\(v\+1\)\}\\neq D\. Let𝒟train\\mathcal\{D\}\_\{\\mathrm\{train\}\}denote the set of individual\-visit transition intervals used for training\. Given the model outputs, the per\-sample loss combines binary cross\-entropy for mortality and multiclass cross\-entropy for the conditional alive\-state distribution given by

ℓi\(v\)\(θ\)=ℓBCE\(yD,i\(v\),zD,i\(v\)\)⏟Death loss\+𝟏\{Yi\(v\+1\)≠D\}ℓCE\(Yi\(v\+1\),𝐳alive,i\(v\)\)⏟Conditional alive\-state loss\.\\ell\_\{i\}^\{\(v\)\}\(\\theta\)=\\underbrace\{\\ell\_\{\\mathrm\{BCE\}\}\\\!\\left\(y\_\{D,i\}^\{\(v\)\},z\_\{D,i\}^\{\(v\)\}\\right\)\}\_\{\\text\{Death loss\}\}\+\\underbrace\{\\mathbf\{1\}\\\{Y\_\{i\}^\{\(v\+1\)\}\\neq D\\\}\\ell\_\{\\mathrm\{CE\}\}\\\!\\left\(Y\_\{i\}^\{\(v\+1\)\},\\mathbf\{z\}\_\{\{\\rm alive\},i\}^\{\(v\)\}\\right\)\}\_\{\\text\{Conditional alive\-state loss\}\}\.\(16\)The overall training objective minimizes the empirical risk

θ^=arg⁡minθ⁡1\|𝒟train\|∑\(i,v\)∈𝒟trainℓi\(v\)\(θ\)\.\\widehat\{\\theta\}=\\arg\\min\_\{\\theta\}\\frac\{1\}\{\|\\mathcal\{D\}\_\{\\mathrm\{train\}\}\|\}\\sum\_\{\(i,v\)\\in\\mathcal\{D\}\_\{\\mathrm\{train\}\}\}\\ell\_\{i\}^\{\(v\)\}\(\\theta\)\.\(17\)
No class weights are applied in the primary specification\. Since the objective is to estimate transition probabilities under the observed real\-world distribution of LTC trajectories, the likelihood is left unweighted rather than artificially re\-balancing rare and common transition outcomes\. The effect of class imbalance is instead examined through rare\-endpoint performance, calibration, and actuarial aggregation diagnostics\.

For model training, we first split the data at the individual level into70%70\\%train,15%15\\%validation and15%15\\%test\. This is done to prevent information leakage across individuals\. Within each individual, observations remain temporally ordered, and all features used to predict the next observed stateYi\(v\+1\)Y\_\{i\}^\{\(v\+1\)\}are available at or before visitvv\. We then train the model with a latent dimensiondh=128d\_\{h\}=128, Time2Vec embedding dimensiondt=8d\_\{t\}=8, and 4 attention heads for attribute conditioning\. Optimization is performed using Adam optimizer with learning rate3e−33e\-3and weight decay1e−61e\-6\. Gradients are clipped to a maximum norm of 1\.0\. We train the model across 50 epochs with early stopping based on validation loss\. All experiments used a fixed random seed of 42 for reproducibility\.

### 5\.3Evaluation Metrics

Model selection is performed using validation loss, and all reported metrics are computed on the held\-out test set\. Statistical uncertainty is quantified using paired individual\-level bootstrap resampling \(1,000 replicates\), preserving within\-individual temporal dependence\. Confidence intervals are reported for differences in endpoint AUROC and PR\-AUC relative to the multinomial logistic baseline\.

Since the objective of this study is to estimate reliable transition probabilities rather than optimize classification accuracy, the evaluation focuses on probabilistic accuracy, calibration, and actuarial risk stratification as well as relevant actuarial cohort projection\.

#### Multi\-State Probabilistic Accuracy

Using the predicted transition distribution𝐩^i\(v\)\\widehat\{\\mathbf\{p\}\}\_\{i\}^\{\(v\)\}defined in Section[4\.1](https://arxiv.org/html/2606.13880#S4.SS1), we assess overall probabilistic accuracy via the multiclass Brier score,

BrierMC=1\|𝒟test\|∑\(i,v\)∈𝒟test∑s∈𝒥\(p^is\(v\)−𝟏\{Yi\(v\+1\)=s\}\)2\.\\mathrm\{Brier\}\_\{\\mathrm\{MC\}\}=\\frac\{1\}\{\|\\mathcal\{D\}\_\{\\mathrm\{test\}\}\|\}\\sum\_\{\(i,v\)\\in\\mathcal\{D\}\_\{\\mathrm\{test\}\}\}\\sum\_\{s\\in\\mathcal\{J\}\}\\left\(\\widehat\{p\}\_\{is\}^\{\(v\)\}\-\\mathbf\{1\}\\\{Y\_\{i\}^\{\(v\+1\)\}=s\\\}\\right\)^\{2\}\.\(18\)The multi\-state Brier score is a proper scoring rule that evaluates the quadratic distance between the predicted transition distribution and the observed state indicator\.

We further report the Expected Calibration Error \(ECE\) computed by partitioning predicted probabilities intoB=10B=10equal width bins on\[0,1\]\[0,1\], and averaging the absolute difference between bin accuracy and mean confidence, weighted by bin frequency\.

#### Binary Endpoint Evaluation

Since mortality and severe disability are financially material endpoints in LTC insurance products valuation, we evaluate them separately using the corresponding marginal probabilities from𝐩^i\(v\)\\widehat\{\\mathbf\{p\}\}\_\{i\}^\{\(v\)\}\. For each endpointE∈\{D,S\}E\\in\\\{D,S\\\}, define the binary outcome asyE,i\(v\)=𝟏\{Yi\(v\+1\)=E\}\.y\_\{E,i\}^\{\(v\)\}=\\mathbf\{1\}\\\{Y\_\{i\}^\{\(v\+1\)\}=E\\\}\.We report AUROC, PR\-AUC, binary Brier score, ECE, and calibration slope and intercept\. The binary Brier score is defined as

BrierE=1\|𝒟test\|∑\(i,v\)∈𝒟test\(p^iE\(v\)−yE,i\(v\)\)2\.\\mathrm\{Brier\}\_\{E\}=\\frac\{1\}\{\|\\mathcal\{D\}\_\{\\mathrm\{test\}\}\|\}\\sum\_\{\(i,v\)\\in\\mathcal\{D\}\_\{\\mathrm\{test\}\}\}\\left\(\\widehat\{p\}\_\{iE\}^\{\(v\)\}\-y\_\{E,i\}^\{\(v\)\}\\right\)^\{2\}\.
Calibration is further assessed via logistic recalibration of the predicted endpoint probabilities as

logit\(Pr⁡\(yE=1∣p^E\)\)=αE\+βElogit\(p^E\),\\mathrm\{logit\}\\big\(\\Pr\(y\_\{E\}=1\\mid\\widehat\{p\}\_\{E\}\)\\big\)=\\alpha\_\{E\}\+\\beta\_\{E\}\\,\\mathrm\{logit\}\(\\widehat\{p\}\_\{E\}\),wherep^E\\widehat\{p\}\_\{E\}denotes the predicted endpoint probability, with the individual and visit indices suppressed for notational simplicity\. Ideal calibration corresponds to intercept0and slope11\. Calibration interceptαE\\alpha\_\{E\}captures systematic bias, while slopeβE\\beta\_\{E\}assesses over\- or under\-dispersion of predicted probabilities\.

#### Risk Stratification

To assess actuarial utility, we evaluate the concentration of adverse outcomes within high\-risk strata\. Letγ∈\(0,1\)\\gamma\\in\(0,1\)denote a risk quantile \(e\.g\., 5% or 10%\)\. For endpointEE, the lift at levelγ\\gammais defined as

LiftE\(γ\)=Pr⁡\(yE=1∣p^Ein topγquantile\)Pr⁡\(yE=1\)\.\\mathrm\{Lift\}\_\{E\}\(\\gamma\)=\\frac\{\\Pr\\big\(y\_\{E\}=1\\mid\\widehat\{p\}\_\{E\}\\text\{ in top \}\\gamma\\text\{ quantile\}\\big\)\}\{\\Pr\(y\_\{E\}=1\)\}\.
We additionally report top\-γ\\gammaevent capture rates and decile\-based observed against predicted risk curves, which quantify the model’s ability to concentrate rare but financially material adverse events within high predicted risk strata, a key requirement for underwriting and capital management\. We also stratify by age to assess demographic consistency by comparing mean predicted transition probabilities with empirical transition frequencies within 10\-year age bands\. This comparison assesses whether predicted transition probabilities preserve age\-gradient structure consistent with actuarial projection assumptions\.

#### Actuarial Projection

To evaluate whether the predicted transition probabilities are useful for actuarial projection, we use the aggregation procedure in Section[4\.3](https://arxiv.org/html/2606.13880#S4.SS3)to construct the model\-implied age\-group and origin\-state transition matrices𝐏^ag\\widehat\{\\mathbf\{P\}\}\_\{ag\}\. We compare these matrices with empirical transition matrices estimated from the observed test\-set transitions, where the empirical transition probability from origin staterrto destination statessin age groupagagis

p~rs\(ag\)=∑\(i,v\):Gi\(v\)=ag𝟏\{Yi\(v\)=r,Yi\(v\+1\)=s\}∑\(i,v\):Gi\(v\)=ag𝟏\{Yi\(v\)=r\}\.\\widetilde\{p\}\_\{rs\}\(ag\)=\\frac\{\\sum\_\{\(i,v\):\\,G\_\{i\}^\{\(v\)\}=ag\}\\mathbf\{1\}\\\{Y\_\{i\}^\{\(v\)\}=r,\\;Y\_\{i\}^\{\(v\+1\)\}=s\\\}\}\{\\sum\_\{\(i,v\):\\,G\_\{i\}^\{\(v\)\}=ag\}\\mathbf\{1\}\\\{Y\_\{i\}^\{\(v\)\}=r\\\}\}\.\(19\)Let𝒦=\{\(ag,r,s\):\|ℐag,r\|\>0,ag∈𝒢,r,s∈𝒥\}\\mathcal\{K\}=\\\{\(ag,r,s\):\|\\mathcal\{I\}\_\{ag,r\}\|\>0,\\ ag\\in\\mathcal\{G\},\\ r,s\\in\\mathcal\{J\}\\\}denote the set of valid age\-group, origin\-state, and destination\-state combinations\. The transition matrix error is summarized using

MAEP=1\|𝒦\|∑\(ag,r,s\)∈𝒦\|p^rs\(ag\)−p~rs\(ag\)\|\\mathrm\{MAE\}\_\{P\}=\\frac\{1\}\{\|\\mathcal\{K\}\|\}\\sum\_\{\(ag,r,s\)\\in\\mathcal\{K\}\}\\Big\|\\widehat\{p\}\_\{rs\}\(ag\)\-\\widetilde\{p\}\_\{rs\}\(ag\)\\Big\|\(20\)and

RMSEP=\[1\|𝒦\|∑\(ag,r,s\)∈𝒦\(p^rs\(ag\)−p~rs\(ag\)\)2\]1/2\.\\mathrm\{RMSE\}\_\{P\}=\\left\[\\frac\{1\}\{\|\\mathcal\{K\}\|\}\\sum\_\{\(ag,r,s\)\\in\\mathcal\{K\}\}\\left\(\\widehat\{p\}\_\{rs\}\(ag\)\-\\widetilde\{p\}\_\{rs\}\(ag\)\\right\)^\{2\}\\right\]^\{1/2\}\.\(21\)

#### Illustrative Valuation Example

For the illustrative valuation exercise, let𝝅agj\\boldsymbol\{\\pi\}\_\{ag\_\{j\}\}denote the cohort occupancy row vector at ordered age groupagjag\_\{j\}\. Starting from a Healthy cohort at ages 60–69,

𝝅60–69=\(1,0,0,0\),\\boldsymbol\{\\pi\}\_\{60\\text\{\-\-\}69\}=\(1,0,0,0\),state occupancy is projected recursively across ordered age groups as

𝝅agj\+1=𝝅agj𝐏^agj\.\\boldsymbol\{\\pi\}\_\{ag\_\{j\+1\}\}=\\boldsymbol\{\\pi\}\_\{ag\_\{j\}\}\\widehat\{\\mathbf\{P\}\}\_\{ag\_\{j\}\}\.Given a state\-contingent benefit vector𝐛=\(0,bM,bS,0\)⊤\\mathbf\{b\}=\(0,b\_\{M\},b\_\{S\},0\)^\{\\top\}, the illustrative expected present value is

EPV=∑j=0J\(1\(1\+R\)m\)j𝝅agj𝐛\\mathrm\{EPV\}=\\sum\_\{j=0\}^\{J\}\\bigg\(\\frac\{1\}\{\(1\+R\)^\{m\}\}\\bigg\)^\{j\}\\boldsymbol\{\\pi\}\_\{ag\_\{j\}\}\\mathbf\{b\}\(22\)whereJJis the final projected age\-group index,RRis the annual discount rate, andmmis the approximate number of years per age\-group step\. In the empirical illustration, we setbM=10,000b\_\{M\}=10,000,bS=50,000b\_\{S\}=50,000,R=3%R=3\\%, andm=10m=10\.

### 5\.4Baselines

We compareLANTERNwith four benchmarks using the same individual\-level data splits and base covariate set unless otherwise stated: Last\-State Persistence \(LSP\), which predicts the next state as the most recent observed state; Multinomial Logistic Regression \(LogReg\)\[[19](https://arxiv.org/html/2606.13880#bib.bib59)\], which provides an interpretable linear baseline for next\-visit transition probabilities; LightGBM\[[23](https://arxiv.org/html/2606.13880#bib.bib58)\], which captures nonlinear tabular interactions without explicit longitudinal memory; and a Gated Recurrent Unit \(GRU\)\[[4](https://arxiv.org/html/2606.13880#bib.bib57)\], which models sequential dependence but does not include the Time2Vec encoding or adaptive attribute conditioning used inLANTERN\. Together, these baselines separate the effects of persistence, linear covariate modeling, nonlinear tabular learning, recurrent memory, and the proposed time\-aware attribute conditioning\.

## 6Results and Discussion

### 6\.1Predictive Performance and Statistical Robustness

We first evaluate overall probabilistic performance using the multiclass Brier score and Expected Calibration Error \(ECE\)\.LANTERNachieves a multiclass Brier score of 0\.268 and an ECE of 0\.0052, matching the strongest tabular baseline, LightGBM \(Brier = 0\.2695, ECE = 0\.0052\), and improving upon logistic regression, GRU, and last\-state persistence\.

We then examine clinically and actuarially relevant binary endpoints derived from the predicted transition probabilities: severe disability and death, with test\-set prevalences of 10\.0% and 1\.2%, respectively, as shown in Table[2](https://arxiv.org/html/2606.13880#S6.T2)\.

Table 2:Endpoint risk prediction\. Severe disability prevalence is 10\.0%; death prevalence is 1\.2%\.For severe disability,LANTERNachieves an AUROC of 0\.911 and PR\-AUC of 0\.617, with a Brier score of 0\.0554 and ECE of 0\.002453\. These results exceed logistic regression \(AUROC 0\.904; PR\-AUC 0\.607\) and LightGBM \(AUROC 0\.909; PR\-AUC 0\.611\), and substantially outperform GRU and LSP\. Precision\-recall gains are particularly relevant in this moderately imbalanced setting\.

For mortality prediction, our model achieves an AUROC of 0\.822 and PR\-AUC of 0\.073, with a Brier score of 0\.0115 and ECE of 0\.00052\. Compared with logistic regression and LightGBM, discrimination differences are modest, though point estimates favorLANTERN\. Given the 1\.2% prevalence, the observed PR\-AUC corresponds to approximately six\-fold enrichment over random performance\. ROC and precision\-recall curves are shown in Figures[7](https://arxiv.org/html/2606.13880#S6.F7)\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/severe_roc_overlay.png)\(a\)Severe ROC
![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/death_roc_overlay.png)\(b\)Death ROC
![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/severe_pr_overlay.png)\(c\)Severe PR
![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/death_pr_overlay.png)\(d\)Death PR

Figure 7:Discrimination performance for severe disability and death\. Top row: ROC curves\. Bottom row: precision\-recall curves\.Paired individual\-level bootstrap resampling with 1,000 replicates confirms statistically robust gains for severe disability \(See Figure[8](https://arxiv.org/html/2606.13880#S6.F8)\)\. Relative to logistic regression,LANTERNimproves AUROC by\+0\.0066\+0\.0066\(95% CI:0\.0049−0\.00850\.0049\-0\.0085\) and PR\-AUC by\+0\.0100\+0\.0100\(95% CI:0\.0032−0\.01650\.0032\-0\.0165\)\. Relative to LightGBM, the corresponding improvements are\+0\.0020\+0\.0020\(95% CI:0\.00065−0\.003510\.00065\-0\.00351\) for AUROC and\+0\.0062\+0\.0062\(95% CI:0\.00027−0\.01250\.00027\-0\.0125\) for PR\-AUC\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x3.png)Figure 8:Paired individual\-level bootstrap \(1,000 replicates\) of AUROC and PR\-AUC differences betweenLANTERNand baseline models\. Positive values indicate superior performance of our model\. Error bars represent 95% percentile confidence intervals\.For mortality, bootstrap confidence intervals overlap zero, indicating that discrimination differences acrossLANTERN, logistic regression, and LightGBM are not statistically distinguishable in this low\-prevalence endpoint\. Thus,LANTERNprovides competitive mortality prediction, while the strongest statistically robust gains are observed for severe disability\.

### 6\.2Calibration and Risk Stratification

We next assess calibration and risk concentration, since actuarial use requires reliable probabilities and effective identification of high\-risk individuals\. For severe disability, predicted probabilities closely align with observed event frequencies across the risk spectrum \(Figure[9\(a\)](https://arxiv.org/html/2606.13880#S6.F9.sf1)\), consistent with the low ECE of 0\.0025\. Logistic recalibration gives an intercept of−0\.01\-0\.01and slope of 0\.97, indicating minimal systematic bias\.

For mortality, ECE remains low \(0\.00052\)\. Logistic recalibration yields an intercept of−0\.34\-0\.34and slope of 0\.91, suggesting mild overconfidence and some underestimation of risk on the log\-odds scale\. This result should be interpreted in light of the low mortality prevalence in the test set\. Additional mortality calibration diagnostics are provided in Supplementary Figure[S1](https://arxiv.org/html/2606.13880#Sx5.F1)\. Overall, probability estimates remain stable and well calibrated across endpoints\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/calibration_overall_severe.png)\(a\)Calibration curve \(Severe disability\)
![Refer to caption](https://arxiv.org/html/2606.13880v1/x4.png)\(b\)Risk\-band stratification \(deciles\)

Figure 9:Probability calibration and operational risk concentration\. Left: reliability diagram for severe disability\. Right: observed event rates and mean predicted probabilities across predicted\-risk deciles \(1 = highest risk\)\.Table 3:Risk stratification and operational performance\. Capture@10% is the fraction of events contained in the top\-risk decile\. Operational precision is evaluated at fixed validation\-calibrated flag rates \(10% for Severe; 1% for Death\)\.We next assess operational risk concentration \(Table[3](https://arxiv.org/html/2606.13880#S6.T3)\)\. For severe disability,LANTERNcaptures 59\.5% of all events in the highest\-risk decile, corresponding to a lift of 5\.95×\\times\. At a fixed 10% flag rate calibrated on validation data, our model achieves a precision of 0\.598, exceeding logistic regression and GRU and performing comparably to LightGBM\.

For death, the top decile underLANTERNcaptures 48\.6% of events, corresponding to a lift of 4\.86×\\times\. At a 1% flag rate, the proposed model achieves the highest operational precision among the evaluated models, with precision of 0\.155, representing substantial enrichment relative to the baseline mortality prevalence\.

Decile\-based observed versus predicted event rates \(Figure[9\(b\)](https://arxiv.org/html/2606.13880#S6.F9.sf2)\) demonstrate monotonic risk stratification with close alignment between predicted and realized outcomes, supporting both discrimination and calibration within high\-risk segments\. These results suggest thatLANTERNcan concentrate a substantial share of severe disability and mortality events within limited high\-risk groups, supporting its potential use in LTC insurance risk segmentation and care\-management workflows\. Additional analysis of risk decile transition dynamics is provided in Supplementary Figure[S3](https://arxiv.org/html/2606.13880#Sx5.F3)\.

### 6\.3Actuarial Projection and Illustrative Valuation

To assess whether individual\-level predictive gains translate into actuarially meaningful projection quantities, we aggregated predicted probabilities into age\-group and origin\-state\-specific transition matrices and compared them with empirical transition matrices from the held\-out test set\. Table[4](https://arxiv.org/html/2606.13880#S6.T4)reports transition\-matrix error and illustrative valuation diagnostics across models\.

Table 4:Actuarial projection and illustrative valuation diagnostics\. MAE and RMSE compare model\-implied age\-group transition matrices with empirical transition matrices from the held\-out test set\. EPV is computed from an illustrative group\-step cohort projection starting Healthy at ages 60\-69, with benefits of 10,000 in Mild disability and 50,000 in Severe disability, discounted at 3% annually\. Final S, D, and Dis\. denote final projected occupancy percentages in Severe disability, Death, and any disability state, respectively, where Dis\. = Mild \+ Severe\.LANTERN achieves the lowest transition\-matrix error across all models, with MAE of 0\.0198 and RMSE of 0\.0356\. Relative to LightGBM, the strongest tabular benchmark, this corresponds to an 11\.6% reduction in MAE and a 10\.3% reduction in RMSE\. Larger reductions are observed relative to logistic regression and GRU\. These results suggest that our model improves not only endpoint prediction but also preservation of the multi\-state transition structure needed for cohort projection\.

Figure[10](https://arxiv.org/html/2606.13880#S6.F10)shows the model\-implied disabled occupancy profiles across age groups\. LANTERN, LightGBM, logistic regression, and GRU produce broadly increasing age\-gradient profiles, while the last\-state persistence benchmark substantially under\-projects disability occupancy\. Projection accuracy is assessed using the transition\-matrix error metrics in Table[4](https://arxiv.org/html/2606.13880#S6.T4), where our model achieves the lowest MAE and RMSE\. Additional projection curves for Severe disability and Death are provided in Supplementary Figures[S4](https://arxiv.org/html/2606.13880#Sx5.F4)and[S5](https://arxiv.org/html/2606.13880#Sx5.F5)\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x5.png)Figure 10:Projected disabled occupancy by age group across models\. Disabled occupancy is defined as the projected probability of being in Mild or Severe disability\.The illustrative valuation exercise shows how differences in estimated transition matrices translate into projected disability occupancy and expected present value\. Starting from a Healthy cohort at ages 60\-69,LANTERNproduces an EPV of 8,003 under the simplified benefit schedule\. This value is not used to rank models by valuation level; rather, it illustrates the downstream effect of applying each estimated transition matrix in a common projection setup\. Projection credibility is therefore assessed by agreement with empirical transition matrices, whereLANTERNachieves the lowest MAE and RMSE\.

The last\-state persistence benchmark performs poorly, with the highest transition\-matrix error, the lowest projected disability occupancy, and an EPV of 1,530\. This highlights the limitation of using observed state persistence alone to estimate forward transition probabilities in an aging cohort\. Overall, the projection exercise shows that the proposed estimator produces valid transition matrices that align closely with empirical transition behavior and can be used directly within a standard discrete\-time actuarial projection workflow\.

### 6\.4Ablation Analysis

We conducted ablation experiments removing attribute attention, temporal irregularity encoding, and both components simultaneously\. Discrimination performance remained stable across variants \(less than0\.3%0\.3\\%absolute AUROC difference for both endpoints\)\. However, calibration deteriorated consistently when architectural components were removed, particularly when both mechanisms were ablated, increasing multiclass ECE from 0\.0052 in the full model to 0\.0102\. These results suggest that the proposed architecture primarily improves probabilistic stability rather than rank\-order discrimination\. Full results are reported in Supplementary Table[S1](https://arxiv.org/html/2606.13880#Sx5.T1)\.

Additional qualitative diagnostics are reported in the Supplementary Material, including risk decile stability matrices, age\-stratified prediction plots, demographic attention diagnostics and individual risk trajectories \(see Figures[S3](https://arxiv.org/html/2606.13880#Sx5.F3)\-[S7](https://arxiv.org/html/2606.13880#Sx5.F7)\)\. These analyses describe model behavior, while the main empirical evidence comes from calibration, endpoint discrimination, risk concentration, transition matrix error, and actuarial projection\.

## 7Conclusion

This study developedLANTERN, a calibrated history\-dependent neural estimator of next\-observation transition probabilities for irregular longitudinal health data\. By combining recurrent latent trajectory representation, irregular time encoding, and adaptive demographic attribute conditioning, our model estimates transitions among Healthy, Mild disability, Severe disability, and Death states\. Using longitudinal HRS data, the model achieved competitive multiclass probabilistic accuracy, low calibration error, and statistically significant improvement in severe\-disability discrimination relative to logistic regression and LightGBM benchmarks\.

Beyond individual\-level prediction,LANTERNwas evaluated in an actuarial projection setting\. Its age\-group transition matrices more closely matched empirical held\-out transition matrices than those of the benchmark models, achieving the lowest transition\-matrix MAE and RMSE\. In an illustrative group\-step valuation exercise, the resulting transition matrices produced actuarially interpretable projected disability and mortality occupancy patterns and corresponding EPV estimates under the stated benefit assumptions\. These results suggest that calibrated longitudinal machine learning models can improve the fidelity of multi\-state transition modeling while remaining compatible with discrete\-time cohort projection\.

The study has several limitations\. The empirical application uses ADL\-based HRS states rather than insured LTC claim states, and the valuation exercise relies on broad age groups and simplified benefit assumptions rather than a full pricing basis\. The model also estimates transition probabilities over the next observed visit rather than continuous\-time transition intensities\. Future work should extend the framework toward continuous\-time or semi\-Markov intensity estimation, incorporate parameter and process uncertainty into occupancy and EPV projections, and validate the approach using insurance portfolio or administrative claims data\.

## CRediT authorship contribution statement

Bright Kwaku Manu: Conceptualization, Methodology, Software, Formal analysis, Investigation, Data curation, Visualization, Writing – original draft, Writing – review & editing;Beckett Sterner:Conceptualization, Methodology, Supervision, Project administration, Writing – review & editing;Petar Jevtić:Conceptualization, Methodology, Supervision, Project administration, Writing – review & editing\.

## Data and Code Availability

## Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper\.

## Acknowledgments

This work was supported by National Institute of Health \(NIH\) grant 5R01GM131405\-02, with Petar Jevtić and Beckett Sterner serving as primary investigators\.

## References

- \[1\]P\. K\. Andersen, Ø\. Borgan, R\. D\. Gill, and N\. Keiding\(1993\)Statistical models based on counting processes\.Springer,New York\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1)\.
- \[2\]G\. W\. Brier\(1950\)Verification of forecasts expressed in terms of probability\.Monthly Weather Review78\(1\),pp\. 1–3\.Cited by:[§2\.3](https://arxiv.org/html/2606.13880#S2.SS3.p2.1)\.
- \[3\]D\. Bugliari, J\. Carroll, O\. Hayden, J\. Hayes, M\. D\. Hurd, S\. Lee, C\. M\. R\. Main, C\. McCullough, E\. Meijer, P\. Pantoja,et al\.\(2023\)RAND hrs longitudinal file 2020 \(v1\) documentation\.Aging\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p3.1),[§5\.1](https://arxiv.org/html/2606.13880#S5.SS1.p1.1)\.
- \[4\]K\. Cho, B\. Van Merriënboer, Ç\. Gulçehre, D\. Bahdanau, F\. Bougares, H\. Schwenk, and Y\. Bengio\(2014\)Learning phrase representations using rnn encoder–decoder for statistical machine translation\.InProceedings of the 2014 conference on empirical methods in natural language processing \(EMNLP\),pp\. 1724–1734\.Cited by:[§4\.2](https://arxiv.org/html/2606.13880#S4.SS2.SSS0.Px4.p1.1),[§5\.4](https://arxiv.org/html/2606.13880#S5.SS4.p1.1)\.
- \[5\]M\. C\. Christiansen\(2010\)Biometric worst\-case scenarios for multi\-state life insurance policies\.Insurance: Mathematics and Economics47\(2\),pp\. 190–197\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p2.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.p1.1),[§3](https://arxiv.org/html/2606.13880#S3.p7.1)\.
- \[6\]M\. Coemans, G\. Verbeke, B\. Döhler, C\. Süsal, and M\. Naesens\(2022\)Bias by censoring for competing events in survival analysis\.bmj378\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p2.1)\.
- \[7\]B\. A\. Curioso, G\. R\. Guerreiro, and M\. L\. Esquível\(2025\)Risk\-adjusted estimation and graduation of transition intensities for disability and long\-term care insurance: a multi\-state model approach\.Risks13\(7\),pp\. 124\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p3.1)\.
- \[8\]G\. D’Amico, M\. Guillen, and R\. Manca\(2009\)Full backward non\-homogeneous semi\-markov processes for disability insurance models: a catalunya real data application\.Insurance: Mathematics and Economics45\(2\),pp\. 173–179\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p2.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p3.1)\.
- \[9\]M\. Denuit, A\. Charpentier, and J\. Trufin\(2021\)Autocalibration and tweedie\-dominance for insurance pricing with machine learning\.Insurance: Mathematics and Economics101,pp\. 485–497\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.3](https://arxiv.org/html/2606.13880#S2.SS3.p3.1)\.
- \[10\]D\. C\. Dickson, M\. R\. Hardy, and H\. R\. Waters\(2020\)Actuarial mathematics for life contingent risks\.Cambridge University Press\.Cited by:[§3](https://arxiv.org/html/2606.13880#S3.p6.10)\.
- \[11\]D\. M\. Eddy, W\. Hollingworth, J\. J\. Caro, J\. Tsevat, K\. M\. McDonald, and J\. B\. Wong\(2012\)Model transparency and validation: a report of the ispor\-smdm modeling good research practices task force–7\.Medical Decision Making32\(5\),pp\. 733–743\.Cited by:[§2\.3](https://arxiv.org/html/2606.13880#S2.SS3.p1.1)\.
- \[12\]J\. H\. Fong, A\. W\. Shao, and M\. Sherris\(2015\)Multistate actuarial models of functional disability\.North American Actuarial Journal19\(1\),pp\. 41–59\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p1.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p3.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p3.1)\.
- \[13\]J\. H\. Fong\(2019\)Disability incidence and functional decline among older adults with major chronic diseases\.BMC geriatrics19\(1\),pp\. 323\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p3.1)\.
- \[14\]Q\. Guibert and F\. Planchet\(2018\)Non\-parametric inference of transition probabilities based on aalen–johansen integral estimators for acyclic multi\-state models: application to ltc insurance\.Insurance: Mathematics and Economics82,pp\. 21–36\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p2.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p3.1)\.
- \[15\]C\. Guo, G\. Pleiss, Y\. Sun, and K\. Q\. Weinberger\(2017\)On calibration of modern neural networks\.InInternational conference on machine learning,pp\. 1321–1330\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.3](https://arxiv.org/html/2606.13880#S2.SS3.p2.1)\.
- \[16\]S\. Haberman and E\. Pitacco\(1997\)Multiple state models for insurance applications\.Chapman & Hall,London\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p1.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p2.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.p1.1),[§3](https://arxiv.org/html/2606.13880#S3.p1.1),[§3](https://arxiv.org/html/2606.13880#S3.p6.4),[§3](https://arxiv.org/html/2606.13880#S3.p7.1),[§3](https://arxiv.org/html/2606.13880#S3.p8.1)\.
- \[17\]K\. Hanewald, H\. Li, and A\. W\. Shao\(2019\)Modelling multi\-state health transitions in china: a generalised linear model with time trends\.Annals of Actuarial Science13\(1\),pp\. 145–165\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p3.1)\.
- \[18\]H\. Harutyunyan, H\. Khachatrian, D\. C\. Kale, G\. Ver Steeg, and A\. Galstyan\(2019\)Multitask learning and benchmarking with clinical time series data\.Scientific data6\(1\),pp\. 96\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px3.p3.1)\.
- \[19\]T\. Hastie\(2009\)The elements of statistical learning: data mining, inference, and prediction\.springer\.Cited by:[§5\.4](https://arxiv.org/html/2606.13880#S5.SS4.p1.1)\.
- \[20\]P\. Hougaard\(1999\)Multi\-state models: a review\.Lifetime data analysis5\(3\),pp\. 239–264\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1)\.
- \[21\]J\. D\. Kalbfleisch and R\. L\. Prentice\(2011\)The statistical analysis of failure time data\.2nd edition,Wiley,Hoboken\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p2.1)\.
- \[22\]S\. M\. Kazemi, R\. Goel, S\. Eghbali, J\. Ramanan, J\. Sahota, S\. Thakur, S\. Wu, C\. Smyth, P\. Poupart, and M\. Brubaker\(2019\)Time2vec: learning a vector representation of time\.arXiv preprint arXiv:1907\.05321\.Cited by:[§4\.2](https://arxiv.org/html/2606.13880#S4.SS2.SSS0.Px2.p2.5)\.
- \[23\]G\. Ke, Q\. Meng, T\. Finley, T\. Wang, W\. Chen, W\. Ma, Q\. Ye, and T\. Liu\(2017\)Lightgbm: a highly efficient gradient boosting decision tree\.Advances in neural information processing systems30\.Cited by:[§5\.4](https://arxiv.org/html/2606.13880#S5.SS4.p1.1)\.
- \[24\]S\. Kessy, Y\. Shen, M\. Sherris, J\. Temple, and J\. Ziveyi\(2024\)Estimating transition probabilities using repeated cross\-sectional data\.UNSW Business School Research Paper Forthcoming\.Cited by:[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p3.1)\.
- \[25\]J\. P\. Klein and M\. L\. Moeschberger\(2003\)Survival analysis: techniques for censored and truncated data\.2nd edition,Springer,New York\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p2.1)\.
- \[26\]S\. Levantesi and M\. Menzietti\(2012\)Managing longevity and disability risks in life annuities with long term care\.Insurance: Mathematics and Economics50\(3\),pp\. 391–401\.Cited by:[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p3.1)\.
- \[27\]S\. Levantesi and M\. Menzietti\(2021\)Modelling health transitions in italy: a generalized linear model with disability duration\.InMathematical and Statistical Methods for Actuarial Sciences and Finance: eMAF2020,pp\. 307–313\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p3.1),[§3](https://arxiv.org/html/2606.13880#S3.p8.1)\.
- \[28\]Y\. Liu, S\. Li, F\. Li, L\. Song, and J\. M\. Rehg\(2015\)Efficient learning of continuous\-time hidden markov models for disease progression\.Advances in neural information processing systems28\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px3.p2.1)\.
- \[29\]F\. Llopis\-Cardona, C\. Armero, and G\. Sanfélix\-Gimeno\(2023\)Estimating disease incidence rates and transition probabilities in elderly patients using multi\-state models: a case study in fragility fracture using a bayesian approach\.BMC Medical Research Methodology23\(1\),pp\. 40\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p3.1)\.
- \[30\]A\. Maegebier\(2013\)Valuation and risk assessment of disability insurance using a discrete time trivariate markov renewal reward process\.Insurance: Mathematics and Economics53\(3\),pp\. 802–811\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p2.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.p1.1),[§3](https://arxiv.org/html/2606.13880#S3.p7.1)\.
- \[31\]X\. Nie and X\. Zhao\(2022\)Forecasting medical state transition using machine learning methods\.Scientific reports12\(1\),pp\. 20478\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px3.p3.1)\.
- \[32\]E\. Olariu, K\. K\. Cadwell, E\. Hancock, D\. Trueman, and H\. Chevrou\-Severac\(2017\)Current recommendations on the estimation of transition probabilities in markov cohort models for use in health care decision\-making: a targeted literature review\.ClinicoEconomics and Outcomes Research,pp\. 537–546\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p2.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.p1.1)\.
- \[33\]K\. Park and M\. Sherris\(2024\)Design and pricing of private long\-term care insurance: an australian analysis\.Available at SSRN 4920310\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p1.1)\.
- \[34\]E\. Pitacco, M\. Denuit, S\. Haberman, and A\. Olivieri\(2014\)Health insurance and reimbursement systems: an actuarial perspective\.Springer,Berlin\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p1.1),[§1](https://arxiv.org/html/2606.13880#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.p1.1),[§3](https://arxiv.org/html/2606.13880#S3.p1.1),[§3](https://arxiv.org/html/2606.13880#S3.p6.4)\.
- \[35\]M\. M\. Rahman and S\. Purushotham\(2023\)Multi\-state survival analysis using pseudo value\-based deep neural networks\.InProceedings of the 2023 SIAM International Conference on Data Mining \(SDM\),pp\. 757–765\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px3.p2.1)\.
- \[36\]A\. Rajkomar, E\. Oren, K\. Chen, A\. M\. Dai, N\. Hajaj, M\. Hardt, P\. J\. Liu, X\. Liu, J\. Marcus, M\. Sun,et al\.\(2018\)Scalable and accurate deep learning with electronic health records\.NPJ digital medicine1\(1\),pp\. 18\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px3.p3.1)\.
- \[37\]O\. L\. Sandqvist\(2023\)A multistate approach to disability insurance reserving with information delays\.arXiv preprint arXiv:2312\.14324\.Cited by:[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p3.1)\.
- \[38\]M\. Sherris and P\. Wei\(2021\)A multi\-state model of functional disability and health status in the presence of systematic trend and uncertainty\.North American Actuarial Journal25\(1\),pp\. 17–39\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p1.1),[§1](https://arxiv.org/html/2606.13880#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p1.1),[§2\.1](https://arxiv.org/html/2606.13880#S2.SS1.p3.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.p1.1)\.
- \[39\]G\. Soutinho and L\. Meira\-Machado\(2020\)Estimation of the transition probabilities in multi\-state survival data: new developments and practical recommendations\.WSEAS Transac Math\. 2020; 19: 353\-366\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p1.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p2.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px1.p3.1),[§3](https://arxiv.org/html/2606.13880#S3.p8.1)\.
- \[40\]E\. W\. Steyerberg and Y\. Vergouwe\(2014\)Towards better clinical prediction models: seven steps for development and an abcd for validation\.European heart journal35\(29\),pp\. 1925–1931\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.3](https://arxiv.org/html/2606.13880#S2.SS3.p1.1),[§2\.3](https://arxiv.org/html/2606.13880#S2.SS3.p2.1)\.
- \[41\]S\. Tanasia, H\. Margaretha, and D\. Krisnadi\(2024\)Generalized linear models for a personalized cancer insurance\.InAIP Conference Proceedings,Vol\.3016,pp\. 030002\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p1.1)\.
- \[42\]B\. Van Calster, D\. J\. McLernon, M\. Van Smeden, L\. Wynants, E\. W\. Steyerberg, and STRATOS Topic Group\(2019\)Calibration: the achilles heel of predictive analytics\.BMC Medicine17\(1\),pp\. 230\.Cited by:[§2\.3](https://arxiv.org/html/2606.13880#S2.SS3.p1.1)\.
- \[43\]Q\. Wang, K\. Hanewald, and X\. Wang\(2022\)Multistate health transition modeling using neural networks\.Journal of Risk and Insurance89\(2\),pp\. 475–504\.Cited by:[§1](https://arxiv.org/html/2606.13880#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px3.p1.1)\.
- \[44\]A\. Wienke\(2010\)Frailty models in survival analysis\.Chapman & Hall/CRC,Boca Raton\.Cited by:[§2\.2](https://arxiv.org/html/2606.13880#S2.SS2.SSS0.Px2.p3.1)\.

## Supplementary Material

### Ablation Study Results

Table S1:LANTERNablation results comparing the full model with variants that remove attribute attention, temporal irregularity encoding, or both components\.
### Additional Calibration Results

Reliability analysis confirms stable probability calibration across risk strata\. Predicted mortality probabilities track observed frequencies closely, with slight overconfidence in higher\-risk bins consistent with the recalibration slope below one\. Overall deviations remain small, supporting actuarial coherence under extreme class imbalance \(Figure[S1](https://arxiv.org/html/2606.13880#Sx5.F1)\)\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/calibration_overall_death.png)Figure S1:Reliability diagram for mortality\. Observed event frequencies are plotted against mean predicted probabilities across uniform risk bins\. The dashed line denotes perfect calibration\.
### Age\-stratified Predictions

Predicted transition risks increase monotonically with age for both severe disability and mortality, closely matching empirical frequencies across age bands\. Confidence intervals widen in older cohorts due to smaller sample sizes, but no systematic under\- or over\-estimation is observed\. These results indicate demographic coherence and stability across the age spectrum as shown in Figure[S2](https://arxiv.org/html/2606.13880#Sx5.F2)\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/mean_sd_by_agegroup_severe.png)\(a\)Severe disability
![Refer to caption](https://arxiv.org/html/2606.13880v1/figures/mean_sd_by_agegroup_death.png)\(b\)Mortality

Figure S2:Age\-stratified observed and predicted probabilities for severe disability \(left\) and mortality \(right\)\. Points represent mean event probabilities within each age group; error bars denote ±1 standard deviation\.
### Risk Stratification Stability Analysis

Figure[S3](https://arxiv.org/html/2606.13880#Sx5.F3)presents the risk decile transition matrices underLANTERNfor severe disability and mortality, respectively\. Across both endpoints, individuals in higher predicted risk deciles exhibit substantial persistence across successive visits, with mass concentrated along the diagonal, particularly in the upper strata\. These patterns indicate temporal stability of the model’s risk stratification, supporting its use in actuarial segmentation and risk monitoring applications\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x6.png)\(a\)Severe Disability
![Refer to caption](https://arxiv.org/html/2606.13880v1/x7.png)\(b\)Mortality

Figure S3:Risk decile transition matrices underLANTERN\. Each panel shows the row\-normalized probability of moving from a current predicted risk decile \(rows\) to a next\-visit decile \(columns\)\. Diagonal concentration indicates temporal stability of risk stratification\.
### Additional Actuarial Projection Results

The main text reports projected disabled occupancy because this quantity most directly summarizes LTC insurance products benefit exposure across Mild and Severe disability states\. For completeness, the figures below show the corresponding model\-implied projections for Severe disability and Death\. These curves illustrate the age\-group occupancy profiles generated by each model’s estimated transition matrices\. They should be interpreted as projection diagnostics rather than observed longitudinal cohort outcomes; model fidelity is assessed in the main text using transition\-matrix MAE and RMSE against empirical test\-set transition matrices\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x8.png)Figure S4:Projected severe\-disability occupancy by age group across models\.![Refer to caption](https://arxiv.org/html/2606.13880v1/x9.png)Figure S5:Projected death occupancy by age group across models\.
### Demographic Attention Diagnostics

We examine the demographic attribute\-conditioning mechanism using an attention shift diagnostic\. For each endpoint, we compare mean attention weights over demographic and socioeconomic attributes between observations in the highest and lowest predicted\-risk deciles\. The plotted values represent the difference in mean attention weight, computed as top 10% minus bottom 10%\. Positive values indicate attributes receiving greater relative attention among high risk predictions, while negative values indicate greater relative attention among low risk predictions\.

These summaries are endpoint specific diagnostics of the demographic and socio\-economic attribute\-conditioning component\. They should not be interpreted as causal effects, marginal feature effects, or full model feature importance because the attention mechanism is applied only to demographic and socioeconomic attributes rather than to the full set of time varying health covariates\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x10.png)\(a\)Severe disability
![Refer to caption](https://arxiv.org/html/2606.13880v1/x11.png)\(b\)Mortality

Figure S6:Demographic attention shifts comparing the highest and lowest predicted risk deciles\. Bars show the change in mean attention weight, computed as top 10% minus bottom 10%\. Positive values indicate demographic or socioeconomic attributes receiving greater relative attention among high risk predictions\.
### Individual\-Level Risk Trajectory Examples

We present individual\-level examples to qualitatively illustrate how model\-predicted risks evolve across visits under heterogeneous trajectories\. Each panel plots the predicted severe\-disability and mortality risks over observed waves for one individual, together with the observed transition timing when applicable\. Horizontal dotted lines indicate endpoint\-specific top\-decile risk thresholds computed from the test set, and vertical dashed lines indicate observed transition times\. The examples include true\-positive, false\-positive, false\-negative, and stable low\-risk trajectories\. These plots are diagnostic illustrations and are not intended as formal tests of temporal smoothness or monotonic deterioration\.

![Refer to caption](https://arxiv.org/html/2606.13880v1/x12.png)\(a\)TP severe disability: high risk
![Refer to caption](https://arxiv.org/html/2606.13880v1/x13.png)\(b\)TP mortality: high risk
![Refer to caption](https://arxiv.org/html/2606.13880v1/x14.png)\(c\)TP severe disability: moderate risk
![Refer to caption](https://arxiv.org/html/2606.13880v1/x15.png)\(d\)TP mortality: moderate risk
![Refer to caption](https://arxiv.org/html/2606.13880v1/x16.png)\(e\)FP severe disability: high risk
![Refer to caption](https://arxiv.org/html/2606.13880v1/x17.png)\(f\)Stable low risk: mortality

Figure S7:Selected individual\-level risk trajectory examples\. Panels show model\-predicted probabilities of severe disability and death over observed waves for selected event, false\-positive, and stable low\-risk cases\. Horizontal dotted lines indicate endpoint\-specific top\-decile risk thresholds, and vertical dashed lines denote observed transition times when present\. These examples are qualitative diagnostics rather than formal tests of temporal smoothness\.
A Longitudinal Attribute-Conditioned Neural Network for Modeling Health-State Transition Probabilities in Temporally Irregular Data: The LANTERN Framework

Similar Articles

LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

LANTERN: Layered Archival and Temporal Episodic Retrieval Network for Long-Context LLM Conversations

DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods

DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System

Submit Feedback

Similar Articles

LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
LANTERN: Layered Archival and Temporal Episodic Retrieval Network for Long-Context LLM Conversations
DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods
DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System