Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health
Summary
This paper proposes an interpretable causal-discovery-guided framework for deriving a Sleep Recovery Score (SRS) from multimodal polysomnography data, demonstrating up to 2.5× stronger alignment with perceived recovery than the traditional Apnea–Hypopnea Index (AHI), with potential applications in connected health.
View Cached Full Text
Cached at: 06/18/26, 05:43 AM
# Beyond AHI: An Interpretable Causal-Discovery–Guided Framework for Sleep Recovery in Connected Health
Source: [https://arxiv.org/html/2606.18506](https://arxiv.org/html/2606.18506)
Saba A\. Farahani∗Elahe Khatibi Manoj Vishwanath Amir M\. Rahmani Hung Cao University of California, Irvine, CA, USA \{fazizaba, ekhatibi, manojv, a\.rahmani, hungcao\}@uci\.edu
###### Abstract
Objective sleep assessment relies on polysomnography \(PSG\), yet clinical impact is often better reflected in patient\-reported outcomes \(PROs\) such as sleepiness and fatigue\. Existing summary indices, including the Apnea–Hypopnea Index \(AHI\), provide limited insight into the multidomain physiology underlying functional recovery\. We propose an interpretable, causal\-discovery–guided framework for deriving a hierarchical Sleep Recovery Score \(SRS\) from multimodal PSG\. Using two large population cohorts \(MESA:n=1,540n=1\{,\}540; MrOS:n=825n=825\), we apply directed acyclic graph \(DAG\) learning to identify candidate physiological drivers spanning respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation\. Although derived from clinical PSG, these domains map naturally to sensing streams increasingly available in connected health technologies, including wearable ECG, oximetry, and sleep\-stage estimation devices\. To preserve mechanistic plausibility, we introduce a two\-stage screening process that combines physiology\-based constraints with constrained LLM\-assisted auditing to identify and remove structural confounders and construct\-overlapping variables\. Across cohorts, these five domains emerge as recurrent physiological domains associated with recovery, and the resulting SRS shows up to 2\.5×\\timesstronger alignment with perceived recovery than AHI\. By linking multimodal sleep physiology to patient\-centered outcomes through an interpretable, bias\-aware, and domain structured framework, this work provides a practical foundation for recovery modeling across both clinical sleep studies and emerging smart and connected health settings\.
Code Availability\.Code, prompt templates, and classification criteria are available at[GitHub](https://github.com/elakhatibi/SRS-causal-discovery)\.
Figure 1:Overview of the proposed sleep recovery framework\. \(A\) Multimodal polysomnography \(PSG\) signals, including EEG, ECG, and respiratory/oximetry channels, are converted into physiological features\. \(B\) A NOTEARS\-based directed acyclic graph \(DAG\) model identifies candidate physiological drivers of recovery\-related outcomes\. \(C\) A two\-stage screening funnel applies physiology\-based constraints and constrained LLM\-assisted auditing to remove confounders and construct\-overlapping variables\. \(D\) Screened mechanisms are aggregated into five physiological domains—respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation—to derive the hierarchical Sleep Recovery Score \(SRS\)\.## IIntroduction
Connected health systems increasingly incorporate multimodal physiological sensing—including wearable electrocardiography \(ECG\), pulse oximetry, and consumer sleep devices—yet translating continuous biosignals into clinically interpretable recovery metrics remains an open challenge\. A central difficulty is that distributed sensing platforms often capture only partial views of sleep physiology, making domain\-structured and interpretable recovery modeling especially important for translation beyond the sleep laboratory\. Polysomnography \(PSG\), the clinical gold standard for sleep assessment, captures rich multimodal recordings spanning electroencephalography \(EEG\), ECG, respiratory flow, and oxygen saturation\[[2](https://arxiv.org/html/2606.18506#bib.bib3)\]\. Despite this richness, clinical practice often reduces these signals to coarse summary indices such as the Apnea–Hypopnea Index \(AHI\) or sleep efficiency\. While clinically useful, these measures primarily capture isolated components of sleep disruption and may fail to reflect the integrated physiological burden underlying functional recovery\[[1](https://arxiv.org/html/2606.18506#bib.bib4),[5](https://arxiv.org/html/2606.18506#bib.bib5)\]\.
Patient\-reported outcomes \(PROs\), including daytime sleepiness and fatigue, capture the lived consequences of sleep dysfunction but do not directly reveal the physiological systems driving impaired recovery\. Prior studies have reported only weak\-to\-moderate alignment between respiratory event metrics and subjective symptom burden\[[4](https://arxiv.org/html/2606.18506#bib.bib6),[3](https://arxiv.org/html/2606.18506#bib.bib7)\], highlighting a persistent disconnect between objective sleep physiology and perceived recovery\. At the same time, recent machine learning approaches for PSG analysis have improved predictive performance, but often operate as black\-box systems optimized for association rather than mechanistic insight\[[6](https://arxiv.org/html/2606.18506#bib.bib8),[7](https://arxiv.org/html/2606.18506#bib.bib9)\]\. In sleep medicine and connected health, however, interpretability, physiological plausibility, and robustness are essential for clinical trust and responsible deployment\.
To address this gap, we propose an interpretable, causal\-discovery–guided framework for deriving a hierarchical Sleep Recovery Score \(SRS\) from multimodal PSG\. Rather than relying on heuristic aggregation or single\-index severity measures, we treat recovery\-related PROs as structured targets and use directed acyclic graph \(DAG\) modeling to identify candidate physiological drivers of recovery\. These mechanisms are then refined through a two\-stage screening process that combines physiology\-based constraints with constrained LLM\-assisted auditing, and are aggregated into five physiological domains: respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation\. Although derived from clinical PSG, these domains align naturally with sensing streams increasingly available in connected health technologies, including wearable ECG, oximetry, and sleep\-stage estimation devices\. We evaluate the proposed framework in two independent population cohorts, MESA and MrOS\. Across cohorts, the resulting SRS shows stronger alignment with perceived recovery than AHI, with improvements of up to 2\.5×\\timesfor some outcomes\. By explicitly linking multimodal sleep physiology to patient\-centered recovery through a transparent and domain\-structured framework, this work provides a practical foundation for recovery modeling across both clinical sleep studies and emerging smart and connected health settings\.
Contributions\.
1. 1\.Interpretable multidomain recovery modeling\.We introduce a hierarchical Sleep Recovery Score that moves beyond single respiratory indices by linking multimodal PSG to patient\-reported recovery outcomes\.
2. 2\.Causal\-discovery–guided mechanism identification\.We apply DAG\-based causal discovery to identify candidate physiological drivers of recovery, refined through physiology\-based constraints and constrained LLM\-assisted auditing\.
3. 3\.Cross\-cohort evidence supporting connected\-health translation\.Across MESA and MrOS, we observe consistent convergence on five physiological domains and show stronger alignment with perceived recovery than AHI, supporting interpretable recovery modeling that can inform future distributed and wearable sensing systems\.
## IIRelated Work
Clinical sleep assessment remains dominated by summary indices such as the Apnea–Hypopnea Index \(AHI\), despite growing evidence that single respiratory metrics do not fully capture downstream physiological burden or functional recovery\[[2](https://arxiv.org/html/2606.18506#bib.bib3),[1](https://arxiv.org/html/2606.18506#bib.bib4),[5](https://arxiv.org/html/2606.18506#bib.bib5)\]\. Patient\-reported outcomes such as daytime sleepiness provide an important complementary view of sleep dysfunction, yet prior work has reported only weak\-to\-moderate alignment between subjective symptom burden and PSG\-derived respiratory indices\[[4](https://arxiv.org/html/2606.18506#bib.bib6),[3](https://arxiv.org/html/2606.18506#bib.bib7)\]\. These findings motivate multidomain recovery models that move beyond event\-count severity alone\.
Recent machine learning methods have improved sleep staging and PSG\-based prediction from multimodal physiological signals, including deep learning and foundation\-model approaches\[[6](https://arxiv.org/html/2606.18506#bib.bib8),[7](https://arxiv.org/html/2606.18506#bib.bib9)\]\. However, these systems typically optimize predictive performance rather than mechanistic interpretability\. Causal discovery methods such as NOTEARS provide a complementary approach for identifying structured dependencies in observational data\[[8](https://arxiv.org/html/2606.18506#bib.bib10)\], but in clinical settings, learned graphs still require mechanism\-aware screening to avoid confounders, proxy effects, and physiologically implausible relationships\. In contrast to prior PSG\-based predictive models, our goal is not only outcome association but extraction of recurrent, physiologically interpretable recovery domains that can support downstream connected\-health translation\. Our work builds on these directions by combining multimodal PSG, causal\-discovery–guided mechanism identification, and structured candidate screening to derive an interpretable recovery score for connected health systems\.
## IIIMethodology
We derive the Sleep Recovery Score by linking PSG features to recovery\-related patient\-reported outcomes through a structured causal\-discovery pipeline\. As shown in Figure[1](https://arxiv.org/html/2606.18506#S0.F1), the framework consists of five steps: \(1\) outcome selection, \(2\) physiological feature construction, \(3\) directed acyclic graph \(DAG\) estimation, \(4\) two\-stage candidate screening, and \(5\) hierarchical score synthesis\. We apply this framework to two population cohorts, MESA \(n=1,540n=1\{,\}540\) and MrOS \(n=825n=825\), using recovery outcomes that include daytime sleepiness, fatigue, perceived sleep quality, and perceived sleep efficiency\.
### III\-ACausal Graph Estimation
LetX∈ℝn×pX\\in\\mathbb\{R\}^\{n\\times p\}denote the matrix ofppcandidate physiological features measured acrossnnindividuals, and letYkY\_\{k\}denote a recovery\-related PRO\. Features span five domains of sleep physiology: respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation\. Representative variables include AHI and hypopnea index for respiratory burden, SpO2and oxygen desaturation index \(ODI\) for hypoxic burden, wake after sleep onset \(WASO\) and arousal index for sleep fragmentation, N3%, REM latency, and spectral power for sleep architecture, and SDNN and RMSSD for autonomic regulation\.
For each outcomeYkY\_\{k\}, we form an analysis tableTk=\[X,C,Yk\]T\_\{k\}=\[X,C,Y\_\{k\}\], whereCCcontains structural covariates such as age, sex, race, and education\. Continuous variables are median\-imputed and standardized, while categorical covariates are one\-hot encoded\. NOTEARS was run with sparsity parameterλ1=0\.02\\lambda\_\{1\}=0\.02and edge\-weight thresholdτ=0\.01\\tau=0\.01, retaining the top\-k=20k=20candidate drivers per outcome \(Fisher\-Z independence test,α=0\.05\\alpha=0\.05\)\. Bootstrap stability selection used 500 resamples; only edges with selection frequency≥0\.6\\geq 0\.6were carried forward to the screening stage\.
We estimate a sparse DAG over the variables inTkT\_\{k\}using the linear NOTEARS formulation\[[8](https://arxiv.org/html/2606.18506#bib.bib10)\]:
minW\\displaystyle\\min\_\{W\}12n‖X−XW‖F2\+λ1‖W‖1\\displaystyle\\frac\{1\}\{2n\}\\\|X\-XW\\\|\_\{F\}^\{2\}\+\\lambda\_\{1\}\\\|W\\\|\_\{1\}\(1\)subject totr\(eW∘W\)−d=0\\displaystyle\\mathrm\{tr\}\\\!\\left\(e^\{W\\circ W\}\\right\)\-d=0whereW∈ℝd×dW\\in\\mathbb\{R\}^\{d\\times d\}is the weighted adjacency matrix,λ1\\lambda\_\{1\}controls sparsity, andddis the number of modeled variables\. We adopt the linear formulation to prioritize interpretability and stability in moderate\-sample biomedical settings, yielding an explicit sparse dependency structure that can be inspected and audited against physiology before score construction\.
Features with directed edges intoYkY\_\{k\}are treated as candidate drivers of recovery\. To improve robustness, we apply bootstrap stability selection and retain only edges whose selection frequency exceeds a thresholdτ\\tau, yielding a sparse and reliability\-filtered candidate set\.
Interpretive note on causal language\.Although we adopt causal\-discovery terminology throughout, MESA and MrOS are observational cross\-sectional cohorts\. Learned DAG edges therefore reflect conditional statistical dependencies rather than established causal relationships\. The two\-stage screening pipeline imposes domain\-motivated plausibility constraints that go beyond statistical structure, but cannot substitute for experimental validation\. We use the termcandidate driverdeliberately to signal a mechanism hypothesis rather than a causal claim, consistent with the exploratory role of causal discovery in biomedical observational settings\[[8](https://arxiv.org/html/2606.18506#bib.bib10)\]\.
### III\-BTwo\-Stage Candidate Screening
Because causal discovery may recover statistically valid but physiologically implausible or clinically unhelpful relationships, we refine candidate drivers through a two\-stage screening process\.
Stage 1: Physiology\-based screening\.Candidate edges are filtered using established sleep mechanisms\. Representative pathways include OSA→\\rightarrowhypoxia→\\rightarrowarousals→\\rightarrowsleepiness, reduced N3 activity→\\rightarrowimpaired restoration, and elevated alpha/beta activity→\\rightarrowcortical hyperarousal\. Edges inconsistent with known physiology are removed\.
Stage 2: Constrained LLM\-assisted auditing\.Remaining candidates are submitted to a structured three\-way classification protocol implemented invalidate\_with\_llm\.py\(see Code Availability\) using a constrained LLM\-assisted audit protocol\. Each candidate feature is classified as: \(i\)plausible mechanistic driver, \(ii\)structural confounder\(e\.g\., demographic proxy such as race, sex, or age\), or \(iii\)construct\-overlapping leakage variable\(e\.g\., a subjective measure correlated by construction with the PRO target\)\. Classification is governed by predefined physiological and methodological criteria supplied as fixed system\-level instructions, ensuring consistent and reproducible auditing across all outcomes\. Only features classified as category \(i\) by both layers advance to consensus aggregation\. Full prompt templates and classification criteria are included in the repository\.
Across multiple recovery outcomes\{Yk\}k=1K\\\{Y\_\{k\}\\\}\_\{k=1\}^\{K\}, the final retained mechanism setℳ\\mathcal\{M\}is obtained by cross\-outcome consensus aggregation, retaining features that recur in at least two outcomes \(k≥2k\\geq 2\)\. This majority\-vote step reduces outcome\-specific noise and improves robustness\.
### III\-CHierarchical SRS Construction
Retained features are grouped into five physiological domains: respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation\. For each domaindd, letℐd\\mathcal\{I\}\_\{d\}denote the subset of retained features assigned to that domain\. We compute the domain score as
Zd=∑j∈ℐdβjZ\(Xj\),Z\_\{d\}=\\sum\_\{j\\in\\mathcal\{I\}\_\{d\}\}\\beta\_\{j\}\\,Z\(X\_\{j\}\),\(2\)whereβj\\beta\_\{j\}is proportional to NOTEARS edge magnitude and cross\-outcome stability frequency, and is normalized such that∑j∈ℐd\|βj\|=1\\sum\_\{j\\in\\mathcal\{I\}\_\{d\}\}\|\\beta\_\{j\}\|=1\. Feature signs are oriented so that higher values of the final score correspond to better physiological recovery\.
The overall Sleep Recovery Score is then defined by hierarchical aggregation across domains:
SRS=∑dαdZd,\\mathrm\{SRS\}=\\sum\_\{d\}\\alpha\_\{d\}\\,Z\_\{d\},\(3\)whereαd\\alpha\_\{d\}denotes the domain\-level weight\. Domain weights are set proportional to the mean cross\-outcome stability of retained features within each domain and are normalized so that∑d\|αd\|=1\\sum\_\{d\}\|\\alpha\_\{d\}\|=1\.
This domain\-structured formulation also supports partial score estimation when only subsets of sensing channels are available, which is important for translation to distributed and wearable monitoring settings\.
Finally, the score is standardized within cohort:
SRSfinal\(c\)=SRS\(c\)−μcσc,\\mathrm\{SRS\}\_\{\\mathrm\{final\}\}^\{\(c\)\}=\\frac\{\\mathrm\{SRS\}^\{\(c\)\}\-\\mu\_\{c\}\}\{\\sigma\_\{c\}\},\(4\)whereμc\\mu\_\{c\}andσc\\sigma\_\{c\}denote the cohort\-specific mean and standard deviation\. Higher values ofSRSfinal\\mathrm\{SRS\}\_\{\\mathrm\{final\}\}therefore indicate greater multidomain physiological recovery\.
TABLE I:Physiological domains underlying the Sleep Recovery Score \(SRS\)\. Representative drivers retained after causal discovery, two\-stage candidate screening, and cross\-outcome consensus in MESA and MrOS\.Domains were retained if they appeared in at least two recovery\-related outcomes \(k≥2k\\geq 2\) and were retained after physiology\-based screening and constrained LLM\-assisted auditing\.
Figure 2:SRS aligns more strongly with perceived recovery than AHI\. Spearman correlations \(ρ\\rho\) between the proposed Sleep Recovery Score \(SRS\) and the Apnea–Hypopnea Index \(AHI\) across recovery\-related outcomes in MESA and MrOS\. In both cohorts, SRS shows consistently stronger associations with perceived recovery, including up to a 2\.5×\\timesimprovement for perceived sleep quality in MrOS\.
## IVResults
### IV\-AFeature Distillation and Mechanism Screening
We performed structured feature distillation to reduce the full physiological feature space to the final retained mechanisms used for SRS construction\.
In MESA, 53 physiological features were initially evaluated\. Linear NOTEARS identified a median of 44 candidate drivers across recovery outcomes\. Physiology\-based screening reduced this set to 34 biologically plausible mechanisms\. After constrained LLM\-assisted auditing and removal of structural covariates and construct\-overlapping variables, 32 consensus mechanisms were retained for hierarchical score construction\.
In MrOS, 174 physiological features were initially considered\. DAG estimation identified a median of 58 candidate drivers per outcome\. Physiology\-based screening reduced this set to 30 plausible mechanisms, and cross\-outcome consensus aggregation yielded 28 final retained mechanisms\.
Structural covariates such as age, sex, and race were occasionally identified as statistical parents of recovery outcomes, but were excluded during screening to prevent proxy effects and preserve mechanistic interpretability\. In MrOS, sleep latency and wake after sleep onset \(WASO\) were identified as construct\-overlapping leakage variables for subjective sleep quality and were removed from final aggregation\. These results show that SRS is not a heuristic combination of PSG variables, but the outcome of causal discovery, physiology\-based screening, and bias\-aware candidate filtering\.
### IV\-BCross\-Cohort Mechanistic Convergence
Across both cohorts and heterogeneous recovery instruments, five physiological domains consistently emerged as recurrent physiological correlates of impaired recovery: \(i\) respiratory burden, \(ii\) hypoxic burden, \(iii\) sleep fragmentation, \(iv\) sleep architecture and EEG dynamics, and \(v\) autonomic regulation\.
Table[I](https://arxiv.org/html/2606.18506#S3.T1)summarizes representative retained drivers within each domain\. These domains capture complementary aspects of impaired recovery, spanning disordered breathing, oxygenation, fragmentation, neurophysiological architecture, and autonomic regulation\.
The recurrence of these five domains across MESA and MrOS supports the biological robustness of the framework and reduces the likelihood that the resulting score is driven by cohort\-specific statistical artifacts\.
TABLE II:Spearmanρ\\rhowith 95% CIs \(Fisherzz\-transformation\) for SRS and AHI across recovery outcomes \(MESA:n=1,540n=1\{,\}540; MrOS:n=825n=825\)\.p∗∗∗<0\.001\{\}^\{\*\*\*\}p\{<\}0\.001,p∗∗<0\.01\{\}^\{\*\*\}p\{<\}0\.01,pns\>0\.05\{\}^\{\\mathrm\{ns\}\}p\{\>\}0\.05\.
### IV\-CAssociation with Recovery Outcomes
Table[II](https://arxiv.org/html/2606.18506#S4.T2)reports Spearmanρ\\rhowith 95% confidence intervals andpp\-values for all outcomes \(MESA:n=1,540n=1\{,\}540; MrOS:n=825n=825; CIs via Fisherzz\-transformation\)\. SRS was statistically significant for four of five outcomes, while AHI reached significance for only one\. In MESA, SRS showed significant associations with ESS \(ρ=0\.098\\rho=0\.098, 95% CI\[0\.048,0\.147\]\[0\.048,0\.147\],p<0\.001p<0\.001\) and daytime sleepiness \(ρ=0\.074\\rho=0\.074,\[0\.024,0\.123\]\[0\.024,0\.123\],p=0\.004p=0\.004\), whereas AHI was non\-significant for both \(ESS:p=0\.055p=0\.055; sleepiness:p=0\.170p=0\.170\)\. In MrOS, SRS showed significant associations with perceived sleep quality \(ρ=0\.107\\rho=0\.107,\[0\.039,0\.174\]\[0\.039,0\.174\],p=0\.002p=0\.002\) and trouble staying asleep \(ρ=0\.111\\rho=0\.111,\[0\.043,0\.178\]\[0\.043,0\.178\],p=0\.001p=0\.001\), while AHI was non\-significant for both \(p=0\.217p=0\.217andp=0\.344p=0\.344, respectively\)\. For perceived sleep efficiency, neither SRS nor AHI reached significance \(p=0\.207p=0\.207andp=0\.228p=0\.228\), consistent with the well\-documented difficulty of predicting subjective efficiency from objective physiology alone\.
These findings suggest that restorative sleep is better characterized as a multidomain physiological construct than as a single respiratory index alone\. Rather than replacing disease\-specific metrics such as AHI, SRS complements them by capturing the broader physiological substrate of recovery\.
## VDiscussion
This work reframes sleep recovery modeling from single\-index severity estimation toward structured, multidomain physiological integration\. We introduced a causal\-discovery–guided framework for constructing an interpretable and hierarchical Sleep Recovery Score \(SRS\) from polysomnography \(PSG\) and patient\-reported outcomes\. Across two independent cohorts \(MESA and MrOS\), the framework converged on a stable set of physiologically coherent domains underlying impaired recovery\.
### V\-AMechanistic Convergence Across Cohorts
Across both cohorts, the framework converged on five physiological domains consistently associated with impaired recovery: respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture and EEG dynamics, and autonomic regulation \(Table[I](https://arxiv.org/html/2606.18506#S3.T1)\)\.
The recurrence of these domains across two independent cohorts suggests that impaired recovery is associated with a stable multidomain physiological structure rather than dataset\-specific variation\. Instead of collapsing heterogeneous mechanisms into a single severity metric, the hierarchical SRS preserves domain\-level structure and enables transparent decomposition of respiratory, hypoxic, neurophysiological, fragmentation, and autonomic contributions to impaired recovery\.
### V\-BInterpretability and Structural Validity
Unlike purely predictive models, the proposed framework explicitly distinguishes mechanistic physiological drivers from structural covariates and construct\-overlapping variables\. Demographic features such as age, sex, and race were occasionally identified as statistical parents during graph estimation, but were excluded from SRS construction to prevent proxy effects\. Similarly, variables overlapping with subjective constructs were removed during candidate screening\.
This structured filtering helps ensure that SRS reflects physiological mechanisms rather than demographic or measurement artifacts, strengthening interpretability and structural validity\.
### V\-CPerformance Relative to Single\-Index Metrics
SRS consistently demonstrated stronger and statistically more reliable alignment with patient\-reported outcomes than AHI across both cohorts \(Table[II](https://arxiv.org/html/2606.18506#S4.T2)\)\. Notably, AHI failed to reach significance for four of five outcomes, whereas SRS was significant for four of five—suggesting that multidomain integration provides not merely stronger but qualitatively more reliable associations with perceived recovery\. For perceived sleep efficiency, the non\-significance of both metrics reflects the complex, multifactorial nature of subjective efficiency as a construct, representing an honest boundary of the framework’s current performance\.
These findings support the view that restorative sleep reflects distributed multidomain physiological burden rather than a single dominant respiratory index\. Rather than replacing disease\-specific measures such as AHI, SRS complements them by capturing the broader physiological substrate of recovery\.
### V\-DClinical and Connected Health Implications
The hierarchical structure of SRS provides more than a composite severity score; it offers a mechanism\-aware framework for targeted interpretation and a natural pathway toward translation in connected health systems\. Because SRS is constructed from distinct physiological domains, domain\-level contributions \(ZdZ\_\{d\}\) can be inspected to identify the dominant driver of impaired recovery in an individual patient\.
For example, recovery impairment driven primarily by hypoxic burden may motivate airway\-focused interventions, whereas impairment dominated by autonomic dysregulation may motivate cardiovascular or behavioral strategies\. Traditional AHI\-based assessment cannot distinguish between these mechanistic profiles\.
Beyond laboratory polysomnography, the domain\-structured formulation enables modular scalability\. Wearable ECG, oximetry, and sleep staging devices capture subsets of these domains, allowing partial but interpretable recovery estimation in distributed sensing environments\. This makes SRS a promising foundation for smart and connected health systems\.
### V\-ELimitations
Several limitations warrant consideration\. First, the linear NOTEARS formulation may not capture nonlinear physiological interactions\. Second, causal structure learning relies on assumptions of causal sufficiency and may omit latent variables\. Third, correlations with subjective outcomes were modest, reflecting the multifactorial nature of perceived recovery\. Fourth, the second\-stage auditing process relied on a constrained LLM\-based procedure rather than domain\-expert review, which may affect the reliability of candidate classification\. Finally, the analysis was cross\-sectional and does not establish temporal causality\. Future work should explore nonlinear causal discovery, expert\-reviewed or prospectively evaluated mechanism screening, longitudinal validation, and intervention\-sensitive recovery modeling\.
## VIConclusion
We introduced a causal\-discovery–guided framework for constructing an interpretable, hierarchical Sleep Recovery Score from polysomnography and patient\-reported outcomes\. Across two independent cohorts, the framework converged on five physiologically coherent domains and showed stronger alignment with perceived recovery than AHI\. These findings support a multidomain view of sleep recovery and suggest that structured, mechanism\-aware modeling can provide a practical foundation for interpretable recovery assessment in smart and connected health systems\.
## References
- \[1\]A\. Azarbarzin, S\. A\. Sands, K\. L\. Stone, L\. Taranto\-Montemurro, L\. Messineo, P\. I\. Terrill, S\. Ancoli\-Israel, K\. Ensrud, S\. Purcell, D\. P\. White, S\. Redline, and A\. Wellman\(2019\)The hypoxic burden of sleep apnoea predicts cardiovascular disease\-related mortality: the osteoporotic fractures in men study and the sleep heart health study\.European Heart Journal40\(14\),pp\. 1149–1157\.External Links:[Document](https://dx.doi.org/10.1093/eurheartj/ehy624)Cited by:[§I](https://arxiv.org/html/2606.18506#S1.p1.1),[§II](https://arxiv.org/html/2606.18506#S2.p1.1)\.
- \[2\]\(2012\)The aasm manual for the scoring of sleep and associated events: rules, terminology and technical specifications\.American Academy of Sleep Medicine,Darien, IL\.Cited by:[§I](https://arxiv.org/html/2606.18506#S1.p1.1),[§II](https://arxiv.org/html/2606.18506#S2.p1.1)\.
- \[3\]Q\. Guo, Y\. Wang,et al\.\(2020\)Weighted epworth sleepiness scale predicted the apnea–hypopnea index better\.Nature and Science of Sleep12,pp\. 685–695\.External Links:[Document](https://dx.doi.org/10.2147/NSS.S247775)Cited by:[§I](https://arxiv.org/html/2606.18506#S1.p2.1),[§II](https://arxiv.org/html/2606.18506#S2.p1.1)\.
- \[4\]M\. W\. Johns\(1991\)A new method for measuring daytime sleepiness: the epworth sleepiness scale\.Sleep14\(6\),pp\. 540–545\.External Links:[Document](https://dx.doi.org/10.1093/sleep/14.6.540)Cited by:[§I](https://arxiv.org/html/2606.18506#S1.p2.1),[§II](https://arxiv.org/html/2606.18506#S2.p1.1)\.
- \[5\]Y\. Peker, H\. Glantz,et al\.\(2025\)Association of hypoxic burden with cardiovascular events and mortality in obstructive sleep apnea: comparison with apnea–hypopnea index\.Chest\.Note:Online ahead of printExternal Links:[Document](https://dx.doi.org/10.1016/j.chest.2025.01.023)Cited by:[§I](https://arxiv.org/html/2606.18506#S1.p1.1),[§II](https://arxiv.org/html/2606.18506#S2.p1.1)\.
- \[6\]M\. Perslev, S\. Darkner, L\. Kempfner, M\. Nikolic, P\. Jennum, and C\. Igel\(2021\)U\-sleep: resilient high\-frequency sleep staging\.npj Digital Medicine4,pp\. 72\.External Links:[Document](https://dx.doi.org/10.1038/s41746-021-00440-5)Cited by:[§I](https://arxiv.org/html/2606.18506#S1.p2.1),[§II](https://arxiv.org/html/2606.18506#S2.p2.1)\.
- \[7\]R\. Thapa, M\. R\. Kjaer, B\. He, I\. Covert, H\. Moore, U\. Hanif, G\. Ganjoo, M\. B\. Westover, P\. Jennum, A\. Brink\-Kjaer, E\. Mignot, and J\. Zou\(2026\)A multimodal sleep foundation model for disease prediction\.Nature Medicine32,pp\. 752–762\.External Links:[Document](https://dx.doi.org/10.1038/s41591-025-04133-4)Cited by:[§I](https://arxiv.org/html/2606.18506#S1.p2.1),[§II](https://arxiv.org/html/2606.18506#S2.p2.1)\.
- \[8\]X\. Zheng, B\. Aragam, P\. Ravikumar, and E\. P\. Xing\(2018\)DAGs with no tears: continuous optimization for structure learning\.InAdvances in Neural Information Processing Systems,Vol\.31\.Cited by:[§II](https://arxiv.org/html/2606.18506#S2.p2.1),[§III\-A](https://arxiv.org/html/2606.18506#S3.SS1.p3.1),[§III\-A](https://arxiv.org/html/2606.18506#S3.SS1.p5.1)\.Similar Articles
Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules
This paper presents a deterministic, rule-based sleep staging method that explicitly implements the American Academy of Sleep Medicine (AASM) scoring rules, providing epoch-level natural language explanations. It achieves 60.5% epoch-level agreement with a majority-vote consensus on 50 polysomnography recordings, offering transparency as a complement to opaque deep learning models.
Longitudinal Multimodal Sensing of Physical Activity and Well-Being in Older Adults
This paper presents a longitudinal multimodal study of 66 older adults using wearable sensing and clinical assessments to predict physical activity, sleep duration, and sleep apnea severity, finding that behavioral targets are more predictable and historical features are key predictors.
Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
This paper investigates the latent structure of multimodal embeddings from a masked autoencoder for pediatric sleep analysis. It shows that augmenting embeddings with geometric, topological, and clinical features improves prediction and calibration for sleep-related events.
A Conflict-aware Evidential Framework for Reliable Sleep Stage Classification
ConfSleepNet is a conflict-aware evidential framework for reliable sleep stage classification using multi-modal data. It introduces hybrid category structures and a conflict-aware aggregation method to resolve inter-view conflicts, demonstrating effectiveness on sleep staging tasks.
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
This paper applies TopK Sparse Autoencoders to three EEG foundation models (SleepFM, REVE, LaBraM) to extract interpretable feature dictionaries and introduces a framework for concept steering, revealing representational failures and clinical entanglements.