Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules
Summary
This paper presents a deterministic, rule-based sleep staging method that explicitly implements the American Academy of Sleep Medicine (AASM) scoring rules, providing epoch-level natural language explanations. It achieves 60.5% epoch-level agreement with a majority-vote consensus on 50 polysomnography recordings, offering transparency as a complement to opaque deep learning models.
View Cached Full Text
Cached at: 05/25/26, 09:02 AM
# Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules
Source: [https://arxiv.org/html/2605.22859](https://arxiv.org/html/2605.22859)
Emil HardarsonDepartment of Computer Science, Reykjavik University, Reykjavik, IcelandReykjavik University Sleep Institute, Reykjavik University, Reykjavik, IcelandKonstantin PopovReykjavik University Sleep Institute, Reykjavik University, Reykjavik, IcelandDepartment of Engineering, Reykjavik University, Reykjavik, IcelandSigridur SigurdardottirAnna Sigridur IslindDepartment of Computer Science, Reykjavik University, Reykjavik, IcelandReykjavik University Sleep Institute, Reykjavik University, Reykjavik, IcelandErna Sif ArnardóttirDepartment of Computer Science, Reykjavik University, Reykjavik, IcelandReykjavik University Sleep Institute, Reykjavik University, Reykjavik, IcelandDepartment of Engineering, Reykjavik University, Reykjavik, IcelandMaría ÓskarsdóttirSchool of Mathematical Sciences, University of Southampton, Southampton, United KingdomDepartment of Computer Science, Reykjavik University, Reykjavik, Iceland
###### Abstract
Automated sleep staging is commonly approached as a supervised machine learning problem, with deep learning methods dominating recent research\. While machine learning models achieve near\-human level agreement with human\-scored reference sleep stages, their decisions are typically opaque and not designed to follow clinical scoring rules\. We propose a transparent alternative: a deterministic, rule\-based sleep staging method that explicitly operationalizes the American Academy of Sleep Medicine’s \(AASM\) scoring logic as executable code, coupled with epoch\-level natural\-language justifications derived from an explanation trace\. We evaluate the approach on 50 polysomnography recordings with a 10\-scorer majority\-vote consensus as reference\. Across all recordings, the method agreed with the majority\-vote reference in 60\.5% of epochs \(κ=0\.42\\kappa=0\.42\), with substantially higher agreement on a dataset used during development \(77\.1%,κ=0\.61\\kappa=0\.61\)\. Agreement with the reference was highest for sleep stage N2 \(recall 83\.5%\) and moderate for sleep stage R \(recall 68\.7%\), while Wake and N1 recall were low\. Despite lower agreement with the reference than contemporary deep learning models, the method provides deterministic decisions and natural language explanations aligned with AASM scoring rules, making it a complementary tool for auditing, debugging, and governing deep learning\-based sleep staging\.
††footnotetext:Abbreviations:AI: Artificial Intelligence, ML: Machine Learning, PSG: Polysomnography, EEG: Electroecephalography, EOG: Electrooculography, EMG: Electromyography, AASM: American Academy of Sleep Medicine, R&K: Rechtschaffen and Kales, LAMF: Low amplitude mixed frequency, SWA: Slow wave activity, LLM: Large language modelKeywords:sleep staging, rule\-based AI, explainable AI, polysomnography
## 1Introduction
Many decisions in clinical practice are governed by written rules and guidelines\. When such decisions are automated, the goal is not only to produce an accurate output, but also to preserve the logic by which the decision is supposed to be made under the rules\[[3](https://arxiv.org/html/2605.22859#bib.bib2),[22](https://arxiv.org/html/2605.22859#bib.bib38)\]\. Automation in clinical decision\-making is increasingly performed with supervised Machine Learning \(ML\) algorithms, which learn statistical patterns that produce high agreement with human labels, without being explicitly constrained to the rules governing the decision\[[33](https://arxiv.org/html/2605.22859#bib.bib13),[20](https://arxiv.org/html/2605.22859#bib.bib4),[3](https://arxiv.org/html/2605.22859#bib.bib2)\]\.
Sleep stage scoring is an example of such a rule\-governed clinical decision\-making workflow\. It is a foundational step in sleep research and clinical practice, serving as the basis for diagnosing sleep disorders and understanding the physiology of sleep\. The gold standard for measuring sleep is a sleep study called polysomnography \(PSG\), which involves the overnight recording of multiple physiological signals, such as electroencephalography \(EEG\), electrooculography \(EOG\), and chin electromyography \(EMG\)\[[36](https://arxiv.org/html/2605.22859#bib.bib16)\]\.
Clinical sleep stage scoring guidelines, such as those published by the American Academy of Sleep Medicine \(AASM\), specify that sleep should be staged in consecutive 30\-second epochs\. Each epoch is classified as one of five sleep stages: Wake, Rapid Eye Movement \(REM\) sleep \(sleep stage R\), or Non\-REM sleep 1, 2, or 3 \(sleep stages N1, N2, or N3\)\. Sleep stage scoring is performed manually by a human sleep expert, in a procedure that typically takes between 1 and 3 hours for each full night recording\[[13](https://arxiv.org/html/2605.22859#bib.bib31),[26](https://arxiv.org/html/2605.22859#bib.bib5)\]\. Manual sleep stage scoring is subject to inter\-scorer variability, and agreement differs between sleep stages\. Multi\-scorer studies have found relatively high agreement for sleep stages Wake, N2, and R, but lower agreement for sleep stages N1 and N3\[[32](https://arxiv.org/html/2605.22859#bib.bib33),[27](https://arxiv.org/html/2605.22859#bib.bib28),[24](https://arxiv.org/html/2605.22859#bib.bib8)\]\. Complete agreement across expert scorers occurs in only a minority of epochs\[[7](https://arxiv.org/html/2605.22859#bib.bib52),[6](https://arxiv.org/html/2605.22859#bib.bib35)\]\. The growing number of sleep studies\[[9](https://arxiv.org/html/2605.22859#bib.bib20)\]and the labor\-intensive nature of manual sleep stage scoring have motivated the development of practical and reliable automatic sleep staging algorithms\[[12](https://arxiv.org/html/2605.22859#bib.bib37)\]\.
In recent years, supervised ML, and specifically Deep Learning \(DL\), has become the dominant methodology for automated sleep staging\[[2](https://arxiv.org/html/2605.22859#bib.bib1),[40](https://arxiv.org/html/2605.22859#bib.bib47),[1](https://arxiv.org/html/2605.22859#bib.bib30),[14](https://arxiv.org/html/2605.22859#bib.bib50)\]\. ML models trained on large datasets of human\-labeled sleep recordings have achieved performance comparable to human inter\-scorer agreement\[[29](https://arxiv.org/html/2605.22859#bib.bib17),[37](https://arxiv.org/html/2605.22859#bib.bib51)\]\. The ML models do this by learning to identify statistical patterns within the signals\[[35](https://arxiv.org/html/2605.22859#bib.bib25),[30](https://arxiv.org/html/2605.22859#bib.bib45),[16](https://arxiv.org/html/2605.22859#bib.bib32)\]\. However, this training procedure is typically not guaranteed to align with the logic of clinical scoring rules\[[34](https://arxiv.org/html/2605.22859#bib.bib26)\]\. The reasoning behind the models’ predictions is difficult to interpret and explain, a phenomenon termed the “black\-box” problem\[[20](https://arxiv.org/html/2605.22859#bib.bib4)\]\.
This opacity of sleep staging ML model reasoning presents a barrier to their clinical adoption\[[34](https://arxiv.org/html/2605.22859#bib.bib26)\]\. When a human sleep expert scores a sleep study, they can justify their decision by pointing to features defined in clinical guidelines, such as sleep spindles or alpha rhythm\[[17](https://arxiv.org/html/2605.22859#bib.bib18)\]\. Although ML models may learn to recognize such patterns, they do so by mapping high\-dimensional data through millions of internal parameters, rather than through explicit logical steps\.
Sleep stage scoring is, by design, a rule\-based decision process whose underlying logic is already specified\. The AASM scoring manual provides a deterministic set of rules, so an algorithm does not, in principle, need to infer this logic from training data\. Despite this, there is a notable gap in contemporary literature regarding methods that explicitly use the scoring logic described in clinical guidelines\.
Historically, automated sleep staging did not begin with ML\. Early methods attempted to translate human scoring practice into algorithms, first under the Rechtschaffen and Kales \(R&K\) criteria\[[31](https://arxiv.org/html/2605.22859#bib.bib34)\]and later under the first version of the AASM scoring manual\[[21](https://arxiv.org/html/2605.22859#bib.bib14),[18](https://arxiv.org/html/2605.22859#bib.bib6),[28](https://arxiv.org/html/2605.22859#bib.bib3)\]\. Earlier rule\-based methods, such as that of\[[25](https://arxiv.org/html/2605.22859#bib.bib21)\], have shown that sleep scoring logic could be operationalized with high agreement to human consensus\[[28](https://arxiv.org/html/2605.22859#bib.bib3),[4](https://arxiv.org/html/2605.22859#bib.bib7),[5](https://arxiv.org/html/2605.22859#bib.bib15),[8](https://arxiv.org/html/2605.22859#bib.bib19)\]\. The field of automated sleep staging later shifted towards ML and DL approaches because of their strong sleep staging accuracy\[[12](https://arxiv.org/html/2605.22859#bib.bib37),[10](https://arxiv.org/html/2605.22859#bib.bib46)\], but rule\-based approaches have not disappeared\[[15](https://arxiv.org/html/2605.22859#bib.bib22)\]\.
A practical challenge in operationalizing sleep scoring rules is that the sleep stage label depends on intermediate signal features described in the scoring manual\. For example, scoring decisions may depend on whether an epoch contains sleep spindles, K\-complexes, slow wave activity, rapid eye movements, alpha rhythm, low\-amplitude mixed\-frequency activity, or changes in chin EMG tone\. In this paper, we refer to these signal segments that are relevant to sleep stage scoring decisions asmicro\-annotations\. Earlier computer\-assisted and rule\-based sleep staging systems have implicitly operationalized parts of clinical scoring logic, for example through expert\-defined features or rules about sleep stage transitions\[[28](https://arxiv.org/html/2605.22859#bib.bib3),[4](https://arxiv.org/html/2605.22859#bib.bib7),[5](https://arxiv.org/html/2605.22859#bib.bib15),[25](https://arxiv.org/html/2605.22859#bib.bib21)\]\. However, most contemporary automatic sleep staging systems report sleep stage labels or probabilities, rather than exposing the rule\-application process that connects signal features to scoring decisions\.
Beyond assigning a sleep stage label, an automated scoring method can also support interpretation by showing how the decision was reached\. In sleep staging, this requires connecting signal features, scoring criteria, and the final stage assignment in a form that is familiar to human scorers\[[17](https://arxiv.org/html/2605.22859#bib.bib18)\]\. Few automatic sleep staging systems provide epoch\-level natural\-language justifications that trace the decision through the scoring rules\.
In this paper, we build on the rule\-based tradition of automatic sleep staging by making the operationalization explicit: we translate the logic of the AASM scoring manual into executable code and record the reasoning process behind each sleep stage assignment\. The result is a deterministic and inspectable method in which sleep stage assignments can be traced to the scoring rules and signal annotations used by the algorithm\.
In this paper, we ask: To what extent can the logic of the AASM sleep stage scoring manual be operationalized as deterministic, inspectable code, and what limitations emerge when such a method is evaluated against multi\-scorer human consensus? We address this research question by developing a sequential, logic\-driven sleep staging algorithm in Python that operationalizes the logic of the AASM scoring rules as executable code\. In addition, we implement an explanation mechanism that records the rule\-based reasoning behind each sleep stage assignment and renders this reasoning as natural\-language justifications linked to scoring rules\.
We evaluate the method using 50 full\-night type II PSG recordings scored independently by 10 human sleep experts\. The evaluation shows that the method achieves moderate agreement with the multi\-scorer consensus, with the strongest performance for sleep stages N2 and R\. A reviewer\-based discrepancy assessment suggested that reviewed disagreements were primarily associated with incomplete or inaccurate micro\-annotations, rather than the rule logic itself\. While previous rule\-based sleep staging methods have used rules as a guideline or processing step, we present what is, to our knowledge, the first method that attempts to explicitly code the clinical logic of the AASM scoring manual into an algorithm\[[39](https://arxiv.org/html/2605.22859#bib.bib27),[2](https://arxiv.org/html/2605.22859#bib.bib1)\]\.
## 2Method
### 2\.1Algorithm
We designed the automatic sleep staging algorithm to emulate the logic of the AASM scoring manual\. The method, written in Python 3\.12, determines sleep stages by processing PSG data in three steps as shown in Figure[1](https://arxiv.org/html/2605.22859#S2.F1)\.
Startfull\-nightattributesmicro\-annotatelabeldefinitestageslabeltransition\-based stagesEndfull\-night processingassigning stagesStartgo tonext epochis this thelast epoch?Enddefinite N3?label N3definite Wake?label Wakedefinite REM?label REMdefinite N2?label N2definite N1?label N1label undefinedyesnoyesnoyesnoyesnoyesnoyesno
Figure 1:Schematic overview of the automatic sleep staging process\. The top diagram shows the high\-level steps from initial processing of full\-night attributes and micro\-annotating to the two\-pass sleep staging process\. Full\-night attributes refer to recording\-level information, such as alpha\-rhythm status and chin electromyography baseline, that is needed before applying some epoch\-level scoring rules\. The labeling of definite sleep stages is shown in an expanded view, where each epoch is sequentially compared to criteria for each of the sleep stages\. Abbreviations: N1, non\-rapid eye movement sleep stage 1; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3; REM, rapid eye movement sleep\.The process begins by loading the raw PSG data stored in European Data Format \(EDF\), a standard file format for physiological time\-series recordings, using the MNE Python library\[[23](https://arxiv.org/html/2605.22859#bib.bib49)\], dividing the signals into 30\-second epochs, and checking signal integrity\. After this, the sleep staging process begins\. First, the method establishes attributes pertaining to the full night’s recording\. Thesefull\-night attributesact as global variables that govern the applicability of certain scoring rules\. For example, the boolean variablegenerates\_alpha\_rhythmis decided via a sliding window Welch periodogram analysis on occipital EEG channels, which estimates the spectral power of the signal over time and identifies whether alpha\-band activity is present\[[36](https://arxiv.org/html/2605.22859#bib.bib16),[38](https://arxiv.org/html/2605.22859#bib.bib29)\]\. This variable distinguishes between the use of the AASM scoring manual’sF2 rule\(sleep should begin when alpha activity is attenuated and replaced by a low\-amplitude mixed\-frequency EEG signal in patients who generate alpha rhythm\) andF3 rule\(sleep starts at the earliest occurrence of background frequency slowing, vertex sharp waves, or slow eye movements\)\. Another full\-night attribute is the chin EMG baseline, which establishes global statistics for the chin EMG signal in order to define thresholds for high and low muscle tone, which is used when classifying sleep stage R\.
Second, the algorithm identifies various physiological events described in the scoring rules\. These micro\-annotations are stored as tuples containing event labels, temporal boundaries, and channel derivations\. The YASA Python library\[[37](https://arxiv.org/html/2605.22859#bib.bib51)\]was used to detect sleep spindles, rapid eye movements \(REMs\), and slow wave activity \(SWA\) on frontal and central EEG channels\. Alpha rhythm and low amplitude mixed frequency \(LAMF\) segments were operationalized using spectral features estimated with Welch periodograms\[[38](https://arxiv.org/html/2605.22859#bib.bib29)\]\. Specifically, power spectral density \(PSD\) was estimated in sliding windows, and each window was characterized by the relative power in frequency bands relevant to the AASM scoring rules\[[36](https://arxiv.org/html/2605.22859#bib.bib16)\]\. Windows meeting the predefined alpha or LAMF criteria were marked as corresponding micro\-annotations\. Alpha rhythm was detected using 2\.0\-second windows advanced in 0\.5\-second steps, corresponding to 75% overlap\. A window was marked as alpha rhythm when power in the 8–12 Hz band accounted for at least 50% of total spectral power\. LAMF was defined heuristically as a contiguous low\-amplitude EEG segment lasting at least 1\.0 second, where low amplitude was defined as absolute signal amplitude below the channel\-level mean absolute amplitude minus 0\.01 standard deviations\. Candidate low\-amplitude segments were then retained as LAMF if power in the 4–7 Hz band accounted for at least 1% of total spectral power\.
The algorithm processes the annotated epoch in two sequential passes to assign sleep stages, as shown in Figure[1](https://arxiv.org/html/2605.22859#S2.F1)\. The first pass evaluates epochs according todefinitescoring rules in the same order of precedence as the AASM scoring rules: sleep stage N3, then wake, R, N2, and finally N1, stopping at the first satisfied rule\. Sleep stage N3 is assigned if SWA covers more than 20% of the epoch\. Only if that criterion is not fulfilled does the method check for evidence of the next sleep stage\. Wake is assigned if alpha rhythm or other markers such as eye blinks exceed 50% of the epoch\. If not, the epoch is labeled sleep stage R if it contains LAMF and REMs without K\-complexes or spindles, provided that EMG tone is low\. Sleep stage N2 is assigned if a K\-complex unassociated with arousal or a spindle occurs in the first half of the current epoch or the last half of the previous one\. Finally, sleep stage N1 is assigned using a simplified alpha\-rhythm rule\. In alpha\-generating recordings, the method assigns N1 when the preceding epoch was Wake, at least 50% of the current epoch consists of LAMF, and alpha rhythm is detected in either the current or preceding epoch\.
Epochs that remain undefined after the first pass are re\-evaluated in a second pass usingtransition rules, which depend on the context of adjacent epochs\. For example, the method continues to score sleep stage R if the previous epoch was sleep stage R and the current epoch exhibits LAMF and low EMG without intervening arousals or spindles\.
Epochs that remain undefined after both passes can either inherit the sleep stage of the preceding epoch or be assigned a sleep stage based on the YASA ML sleep stage classifier, depending on the user’s preference\. We report results for the inheritance\-only configuration\. Algorithm[1](https://arxiv.org/html/2605.22859#alg1)shows the step\-by\-step procedure for translating physiological micro\-annotations into sleep stage labels in broad terms\.
Algorithm 1Rule\-Based Sleep StagingEpochs
\{Et\}t=1T\\\{E\_\{t\}\\\}\_\{t=1\}^\{T\}with micro\-annotations\.
Sleep stage labels
yty\_\{t\}and explanations
ℰt\\mathcal\{E\}\_\{t\}\.
Initialize:Set all labels
yt←undefinedy\_\{t\}\\leftarrow\\text\{undefined\}
Pass 1: Definite Rules \(Local Epoch Criteria\)
foreach epoch
EtE\_\{t\}from
11to
TTdo
1\. Stage N3 Check:If Slow Waves
≥20%\\geq 20\\%of epoch
⟹yt←N3\\implies y\_\{t\}\\leftarrow\\text\{N3\}
2\. Wake Check:If Wake\-signs \(Alpha, Blinks, etc\.\)
\>50%\>50\\%⟹yt←Wake\\implies y\_\{t\}\\leftarrow\\text\{Wake\}
3\. Stage R Check:If \(LAMF \+ REMs \+ Low EMG\) and No \(Spindle/K\-Complex\)
⟹yt←REM\\implies y\_\{t\}\\leftarrow\\text\{REM\}
4\. Stage N2 Check:If Spindle or K\-Complex in current 1st half or previous 2nd half
⟹yt←N2\\implies y\_\{t\}\\leftarrow\\text\{N2\}
5\. Stage N1 Check:If \(Alpha attenuated\) and \(LAMF
\>50%\>50\\%\)
⟹yt←N1\\implies y\_\{t\}\\leftarrow\\text\{N1\}
Log Reasoning:Update
ℰt\\mathcal\{E\}\_\{t\}with triggered rule and reasons for rejecting other rules\.
endfor
Pass 2: Transition Rules \(Temporal Context\)
foreach epoch
EtE\_\{t\}where
yty\_\{t\}is stillundefineddo
REM Continuity:If
yt−1=stage Ry\_\{t\-1\}=\\text\{stage R\}and features remain consistent
⟹yt←stage R\\implies y\_\{t\}\\leftarrow\\text\{stage R\}
N2 Continuity:If
yt−1=stage N2y\_\{t\-1\}=\\text\{stage N2\}and no evidence of stage change
⟹yt←stage N2\\implies y\_\{t\}\\leftarrow\\text\{stage N2\}
Inheritance:If still undefined, inherit stage from
yt−1y\_\{t\-1\}\.
Log Reasoning:Document the use of preceding epoch in
ℰt\\mathcal\{E\}\_\{t\}\.
endfor
returnSleep stage labels and explanations
\{\(yt,ℰt\)\}t=1T\\\{\(y\_\{t\},\\mathcal\{E\}\_\{t\}\)\\\}\_\{t=1\}^\{T\}
### 2\.2Explanation generation
One of the strengths of a rule\-based sleep staging algorithm is the possibility of providing transparent justifications for every staging decision\. The method does this by populating anexplanation logas it traverses the logical steps\. The algorithm appends the result of every sleep stage evaluation to a list within the explanation log, which is implemented as a nested dictionary\. Each entry in the list is a tuple containing the sleep stage being evaluated, a boolean indicating if the criteria were met, and a text string detailing the findings\. This creates a complete trace of the step\-by\-step decision process\.
The explanation log shows how the final sleep stage is reached by sequentially ruling out alternative candidate sleep stages until the final decision is reached\. Because the algorithm stops searching once a definite sleep stage is assigned, the logs naturally present a chronological account of what was tried, why certain sleep stages were rejected \(e\.g\., "slow wave activity at 15%, below the 20% threshold"\), and which rule ultimately triggered the label\.
The explanation log is a technical data structure designed for programmatic tracking of operations\. To facilitate the user’s deeper understanding of the logical reasoning behind staging decisions, the explanation log is fed into a Large Language Model \(LLM; specifically OpenAI GPT\-4\)\. The LLM’s system prompt is constructed with the epoch’s explanation log and the Python source code used for the classification and explanation log\. This provides the LLM with the context necessary to interpret the staging decision and justify it by referencing specific events in the signals, as well as the AASM scoring rules\.
### 2\.3Dataset
We evaluated our method on the Sleep Revolution Multicenter Scoring dataset, which consists of 50 prospective full\-night type II PSG recordings collected at Reykjavik University \(February \- June 2021\) using a Nox A1 system\[[27](https://arxiv.org/html/2605.22859#bib.bib28)\]\. Each recording was independently scored by 10 human sleep experts from seven sleep centers \(April \- September 2021\) in Europe and Australia following AASM v2\.6 criteria, enabling both a majority\-vote consensus hypnogram and a per\-epoch agreement ratio \(fraction of scorers assigning the most frequently scored sleep stage\)\.
The cohort is adult and mixed with respect to common sleep disorders \(healthy, obstructive sleep apnea, insomnia, and restless legs syndrome symptoms\) \(58% male, mean age 42\.9±\\pm13\.7 years, mean BMI 27\.3±\\pm5\.8 kg/m2, and mean AHI 15\.2±\\pm15\.6 events/h\)\[[27](https://arxiv.org/html/2605.22859#bib.bib28)\]\. One PSG was used in the development of the method for manual iterative rule implementation and parameter tuning and was subsequently excluded from the primary evaluation\. The remaining 49 PSGs form the test set\.
### 2\.4Evaluation
The performance of the method was evaluated using the dataset, with the majority vote consensus hypnogram of the ten scorers as a reference\. We assessed agreement both in terms of the epoch\-by\-epoch sleep stage classification and sleep architecture metrics\. We calculated overall staging accuracy and Cohen’s kappa, as well as confusion matrices, to identify sleep\-stage\-specific patterns of disagreement\. Before calculating agreement and sleep\-architecture metrics, each recording was cropped to an analysis period\. FollowingNikkonenet al\.\[[27](https://arxiv.org/html/2605.22859#bib.bib28)\], this period was defined as the interval from the first epoch to the last epoch in which at least one human scorer assigned a sleep stage other than Wake\. Leading and trailing all\-Wake periods were removed, while Wake epochs within this interval were retained\. The same cropping procedure was applied to the development recording and to all test recordings\.
To complement this performance evaluation, a reviewer with sleep scoring experience conducted a reviewer\-based discrepancy assessment to characterize sources of disagreement between the method and the reference\. This assessment targeted two categories: \(1\) Epochs characterized by high human agreement where the method did not match the consensus, and \(2\) epochs with low human agreement where the method matched the sleep stage of the majority\-vote consensus\. A selection of epochs from both categories was investigated to determine whether the disagreement was associated with inaccurate or incomplete micro\-annotations, rule application, or another source\. By examining epochs with high human disagreement, our analysis sought to determine if the method’s classification could provide insight into the reasons for ambiguity in human scoring\.
We randomly sampled 20 epochs from the PSG used for method development, which had been used during iterative rule implementation and parameter tuning\. For each sampled epoch, the reviewer first assigned a sleep stage using the PSG signals and the AASM scoring manual, without knowing the sleep stage label from the human sleep experts and the sleep staging algorithm\. Next, the reviewer was shown the human sleep expert consensus sleep stage label and the agreement ratio, followed by the sleep stage label predicted by the algorithm and the algorithm’s explanation\. The reviewer then answered three yes/no questions: \(1\) Were the micro\-annotations used by the algorithm correct? \(2\) Given those micro\-annotations, was the applied scoring rule correct? \(3\) Was the assignment of definite or transition sleep stage label correct? If the reviewer determined that micro\-annotations were incorrect, they could answer “not applicable” to the other two if deemed necessary\.
## 3Results
### 3\.1Output data
For each PSG recording, the method outputs a hypnogram consisting of epoch\-by\-epoch sleep stages, along with an epoch\-by\-epoch visualization of the micro\-annotations\. Additionally, the method produces the explanation log for every staging decision\. Figure[2](https://arxiv.org/html/2605.22859#S3.F2)shows an example of an epoch which the algorithm classified as N2\. The figure shows the signals used, along with the micro\-annotations\.

Figure 2:One epoch with micro\-annotations\. The displayed channels include electrooculography \(EOG: E1\-M2, E2\-M1\), electroencephalography \(EEG: F4\-M1, F3\-M2, C4\-M1, C3\-M2, O2\-M1, O1\-M2\), and chin electromyography \(EMG: 1\-2\)\. Colored overlays indicate detected sleep spindles, low\-amplitude mixed\-frequency activity \(LAMF\), slow wave activity \(SWA\), and rapid eye movements \(REMs\)\. This epoch was classified as sleep stage N2\. Abbreviations: EEG, electroencephalography; EOG, electrooculography; EMG, electromyography; LAMF, low\-amplitude mixed\-frequency activity; SWA, slow wave activity; REMs, rapid eye movements; N2, non\-rapid eye movement sleep stage 2\.Along with inspecting the raw explanation log, the user has two ways to view explanations:static\-andinteractiveexplanations\. In static explanations, the log is rendered as a list of elimination rules that outline why each candidate sleep stage was not assigned until the final rule triggers the selected label, in the order in which they were applied\. Figure[3](https://arxiv.org/html/2605.22859#S3.F3)shows a static explanation generated by the algorithm for the same epoch as shown in Figure[2](https://arxiv.org/html/2605.22859#S3.F2)\.
×\\timesNot definite N3because only 3\.8% of the epoch consists of slow wave activity\.×\\timesNot definite Wbecause 0% of the epoch consists of alpha rhythm or other findings consistent with sleep stage W\.×\\timesNot definite Rbecause only 22\.6% of the epoch meets the criteria for sleep stage R\.✓\\checkmarkThe epoch is definite N2\. A spindle was found in the first half of the epoch, and 4 spindles were found in the second half of the previous epoch\.Figure 3:Example of a sequential elimination trace produced by the rule\-based sleep staging method\. The explanation rules out incompatible sleep stages in sequence before stating the final sleep stage assignment and its supporting evidence\. Abbreviations: W, Wake; R, rapid eye movement sleep; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3\.Interactive explanations can be explored in a conversation with an LLM through a chat interface\. The LLM \(OpenAI GPT 4\) summarizes each epoch’s explanation log in concise natural language and responds to the user’s questions\. Figure[4](https://arxiv.org/html/2605.22859#S3.F4)shows an example of an interactive explanation\.
User:What sleep stage is this?System:The epoch was labeled as N2\. The decision was based on the rules that indicate that N2 should be scored if spindles or K\-complexes are observed in the first half of the epoch or the second half of the previous epoch\. In this case, several spindles were found: 1 in the first half of this epoch and 4 in the second half of the previous epoch\.User:Why not N3?System:The epoch was not labeled as N3 because only 3\.8% of the epoch consisted of slow wave activity\. According to the rules, an epoch is considered N3 if 20% or more of it consists of slow wave activity\.Figure 4:Example of a natural\-language explanation dialogue produced by the rule\-based sleep staging method, showing both the justification for the assigned sleep stage and the rejection of an alternative candidate sleep stage\. Abbreviations: N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3\.Given identical inputs, the method is deterministic: repeated runs produce identical hypnograms, micro\-annotations and explanation logs\. However, the LLM responses in the interactive explanations vary across runs, due to probabilistic sampling during the LLM’s sampling process\[[19](https://arxiv.org/html/2605.22859#bib.bib48)\]\.
The average runtime for a full\-night recording was approximately 90 minutes on an Intel Core i7\-8650U CPU, with the overwhelming majority of computation time spent in preprocessing and annotation\.
### 3\.2Comparison to human consensus
To evaluate agreement with human scoring, we compared the algorithm’s epoch\-by\-epoch sleep stage labels to the majority\-vote consensus hypnogram of the ten human scorers in the dataset\. As described in Section[2\.3](https://arxiv.org/html/2605.22859#S2.SS3), one PSG recording was used during the development of the algorithm to iterate on rule implementation and parameter tuning\. This recording was excluded from the test set, and we report results on thisdevelopment recordingseparately\.
Tables[1](https://arxiv.org/html/2605.22859#S3.T1)and[2](https://arxiv.org/html/2605.22859#S3.T2)show sleep stage confusion matrices for the development PSG, expressed as row\-normalized percentages and total counts, respectively\. Tables[3](https://arxiv.org/html/2605.22859#S3.T3)and[4](https://arxiv.org/html/2605.22859#S3.T4)show the same, but aggregated across all subjects in the test set\.
Overall agreement with majority\-vote consensus across the dataset was 60\.5% with Cohen’sκ\\kappa=0\.42\. Agreement was highest for N2 \(recall 83\.5%\) and moderate for sleep stage R \(recall 68\.7%\) and N3 \(recall 50\.5%\)\. Wake and N1 recall were low \(14\.5% and 5\.9%, respectively\)\.
Table 1:Confusion matrix for the single development polysomnography \(excluded from the primary evaluation\), comparing epoch\-by\-epoch predictions against the majority\-vote consensus of ten human sleep experts, expressed in percentages\. Each row is normalized to represent 100% of the epochs for that specific reference sleep stage\. The rows show the reference sleep stages, while the columns show the sleep stages assigned by the method\. Abbreviations: N1, non\-rapid eye movement sleep stage 1; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3; R, rapid eye movement sleep\.Table 2:Confusion matrix for the single development polysomnography \(excluded from the primary evaluation\), expressed in total epoch counts\. The rows show the reference sleep stages, while the columns show the sleep stages assigned by the method\. Abbreviations: N1, non\-rapid eye movement sleep stage 1; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3; R, rapid eye movement sleep\.Table 3:Confusion matrix for all test recordings in the dataset, comparing the algorithm’s epoch\-by\-epoch predictions against the majority\-vote consensus of ten human sleep experts, expressed in percentages\. Each row is normalized to represent 100% of the epochs for that specific reference sleep stage\. The rows show the reference sleep stages, while the columns show the sleep stages assigned by the algorithm\. Abbreviations: N1, non\-rapid eye movement sleep stage 1; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3; R, rapid eye movement sleep\.Table 4:Confusion matrix for all test recordings in the dataset, comparing the algorithm’s epoch\-by\-epoch predictions against the majority\-vote consensus of ten human sleep experts, expressed in total epoch counts\. The rows show the reference sleep stages, while the columns show the sleep stages assigned by the algorithm\. Abbreviations: N1, non\-rapid eye movement sleep stage 1; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3; R, rapid eye movement sleep\.Table[5](https://arxiv.org/html/2605.22859#S3.T5)shows the method’s mean absolute difference from sleep architecture metrics derived from the majority\-vote consensus hypnograms\. For comparison, the table also shows the mean absolute difference between individual human scorers and the consensus of the other nine scorers\. Results are shown separately for the test set and for the development recording\. In the test set, the method showed larger differences from the consensus\-derived metrics than individual human scorers across all evaluated sleep metrics in the test set\.
Table 5:Comparison of method and human mean absolute differences \(MAD\) from consensus\-derived sleep metrics\.Method MADandHuman MADrefer to the test set\.Dev\. Method MADandDev\. Human MADrefer to the single development polysomnography excluded from the primary test\-set evaluation\.
### 3\.3Assessment of discrepancies

Figure 5:The top panel shows the human consensus hypnogram for the polysomnography used during the development of the method, with areas of high human disagreement highlighted\. The bottom panel shows a predicted hypnogram for the same polysomnography, with disagreements with the human consensus hypnogram highlighted in red\. Abbreviations: N1, non\-rapid eye movement sleep stage 1; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3; REM, rapid eye movement sleep\.
Figure 6:The top panel shows the human consensus hypnogram for one of the test polysomnographies with areas of high human disagreement highlighted\. The bottom panel shows a predicted hypnogram for the same polysomnography, with disagreements with the human consensus hypnogram highlighted in red\. Abbreviations: N1, non\-rapid eye movement sleep stage 1; N2, non\-rapid eye movement sleep stage 2; N3, non\-rapid eye movement sleep stage 3; REM, rapid eye movement sleep\.Figures[5](https://arxiv.org/html/2605.22859#S3.F5)and[6](https://arxiv.org/html/2605.22859#S3.F6)show examples of hypnograms comparing the algorithm’s sleep stage predictions to the majority\-vote consensus\. In both figures, epochs where the algorithm disagrees with the consensus sleep stage are highlighted in red\. The consensus hypnogram is highlighted in a progressively darker shade of yellow, depending on the level of disagreement among the 10 scorers\. Many epochs where the algorithm disagreed with the reference occurred in areas where human agreement was low\. This can also be seen in Figure[7](https://arxiv.org/html/2605.22859#S3.F7), which shows the distribution of human agreement ratios for epochs where the algorithm agreed with the reference, compared with epochs where it disagreed\.
The reviewer’s discrepancy assessment suggested that when the reviewer agreed with the method’s sleep stage label, the decision was usually supported by the appropriate reasoning\. In 71% of epochs where the reviewer agreed with the method’s sleep stage label, the micro\-annotations, rule application, and definite/transition assignment were all deemed correct by the reviewer\. In epochs where the reviewer disagreed with the method’s label, the micro\-annotations were judged incorrect in all cases\. These findings suggest that, among the reviewed examples, reviewer\-method disagreements were primarily associated with micro\-annotation quality, rather than the rule logic itself\.

Figure 7:Distribution of human inter\-scorer agreement for epochs where the rule\-based algorithm agreed versus disagreed with the 10\-scorer majority\-vote reference\. The horizontal axis shows the per\-epoch agreement ratio \(fraction of scorers assigning the modal sleep stage\)\. Disagreements with the reference are proportionally more common where human agreement is lower\.
## 4Discussion
This work proposes a rule\-based automatic sleep staging method that operationalizes the AASM scoring manual\[[36](https://arxiv.org/html/2605.22859#bib.bib16)\]as executable code, giving decisions that are fully deterministic and explainable\[[34](https://arxiv.org/html/2605.22859#bib.bib26),[20](https://arxiv.org/html/2605.22859#bib.bib4)\]\. Rather than treating sleep staging solely as a statistical prediction problem, the method codifies the clinical decision process, such that the reasoning can be inspected and rendered as static or interactive natural\-language explanations\. In addition to the sleep stage classifier, the method thus provides a tool for auditing, education, and explanation in clinical settings where alignment with standardized scoring rules is of importance\.
The method clearly does not outperform contemporary ML and DL methods in terms of scoring accuracy\[[2](https://arxiv.org/html/2605.22859#bib.bib1),[12](https://arxiv.org/html/2605.22859#bib.bib37),[14](https://arxiv.org/html/2605.22859#bib.bib50),[11](https://arxiv.org/html/2605.22859#bib.bib36),[30](https://arxiv.org/html/2605.22859#bib.bib45)\]\. Our results show that although recall for sleep stages N2 and R is strong, the method showed lower agreement with the reference for Wake, N1, and, to a lesser extent, N3\. However, it provides an alternative, complementary way to estimate sleep stages\. This opens up the possibility of hybrid automatic sleep staging methods\[[25](https://arxiv.org/html/2605.22859#bib.bib21),[15](https://arxiv.org/html/2605.22859#bib.bib22)\], where a deterministic and auditable method, grounded in clinical rules, runs in parallel with ML/DL sleep staging methods, as a reference or cross\-check\.
The reviewer\-based discrepancy assessment performed in this work clarifies where the current implementation differed from the majority\-vote reference and how those differences arose\. Many disagreements between the algorithm and the human consensus scoring occurred in epochs where human inter\-scorer agreement was also low, suggesting that disagreements often occur in ambiguous regions of sleep\.
The assessment also showed that when the reviewer disagreed with the algorithm’s sleep stage label, the micro\-annotations were judged to be incorrect, pointing to the micro\-annotation quality as the main technical limitation of the current implementation\.
The micro\-annotations were not independently validated against expert event\-level annotations\. In particular, alpha rhythm and LAMF were operationalized using PSD\-based heuristics rather than a separately validated detector\. Future work should therefore validate and improve the available micro\-annotations, including alpha rhythm, LAMF, REMs, spindles, SWA, chin EMG tone, and missing detectors such as K\-complex detection\. Wake classification was particularly difficult to operationalize\. This may reflect that some visual cues used by human scorers to recognize Wake epochs are less straightforward to translate into computational rules, especially in noisy epochs with movement artifacts\. The scoring manual is written to support expert human judgment but leaves ambiguity for implementation\[[34](https://arxiv.org/html/2605.22859#bib.bib26)\]\. For example, some scoring criteria depend on judgment of whether the majority of an epoch contains evidence for a sleep stage, without specifying how that evidence should be segmented or aggregated computationally\[[36](https://arxiv.org/html/2605.22859#bib.bib16)\]\. Future iterations of scoring manuals could benefit from more operationally precise phrasing, fit for algorithmic implementation\.
## 5Conclusion
We presented a deterministic, rule\-based sleep staging method that operationalizes the AASM scoring manual as executable code and records the reasoning behind each epoch\-by\-epoch decision as an inspectable explanation trace\. The method was evaluated on 50 full\-night type II PSG recordings scored independently by 10 human sleep experts, using the majority\-vote consensus as reference\. Overall agreement with the consensus was moderate \(60\.5%,κ\\kappa=0\.42\), with strong recall for sleep stages N2 and R, and lower recall for Wake, N1, and N3\. A reviewer\-based discrepancy assessment indicated that disagreements were primarily associated with the quality of micro\-annotations rather than the rule logic itself\.
Taken together, these findings suggest that even if rule\-based sleep staging is not yet competitive with DL in terms of agreement with human\-scored references, it remains valuable as a transparent complement to black\-box methods\. Its current limitations appear in the lack of reliable micro\-annotation and the ambiguity of a scoring manual written for human experts\.
## Author contributions
EH conceived the study, developed and implemented the method, and led the manuscript writing\. SS contributed to the conception and testing of the method and manuscript writing\. KP contributed to data analysis, interpretation, and manuscript writing\. ASI, ESA, and MÓ contributed to study design, acquisition of funding, interpretation of results, and manuscript writing\. All authors reviewed and approved the final manuscript\.
## Acknowledgments
We thank Kristín Anna Ólafsdóttir, Heiður Grétarsdóttir, and Gabriel Jouan for their help during the development of the method, the 10 scorers of the data set, as well as other staff and students of the Sleep Revolution project who contributed to data collection and analysis\. LLMs were used during the writing of this paper to enhance the readability of the text\.
## Financial disclosure
This project was funded by the European Union’s 2020 Research and Innovation Program under Grant 965417 and the Icelandic Research Fund under Doctoral Student Grant 2410607\-051\.
## Conflict of interest
ESA discloses lecture fees from Nox Medical, ResMed, Jazz Pharmaceuticals, Linde Healthcare, Wink Sleep, Apnimed, and Vistor\. ESA is a previous member of the Philips Sleep Medicine & Innovation Medical Advisory Board and the Lille Medical Advisory Board and is currently on the Sleep Cycle Data Monitoring Committee\. The other authors declare no potential conflict of interest\.
## References
- \[1\]M\. Alattar, A\. Govind, and S\. Mainali\(2024\-03\)Artificial Intelligence Models for the Automation of Standard Diagnostics in Sleep Medicine—A Systematic Review\.Bioengineering11\(3\),pp\. 206\(en\)\.Note:Number: 3External Links:ISSN 2306\-5354,[Link](https://www.mdpi.com/2306-5354/11/3/206),[Document](https://dx.doi.org/10.3390/bioengineering11030206)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1)\.
- \[2\]H\. Alsolai, S\. Qureshi, S\. M\. Z\. Iqbal, S\. Vanichayobon, L\. E\. Henesey, C\. Lindley, and S\. Karrila\(2022\)A Systematic Review of Literature on Automated Sleep Scoring\.IEEE Access10,pp\. 79419–79443\.Note:Conference Name: IEEE AccessExternal Links:ISSN 2169\-3536,[Link](https://ieeexplore.ieee.org/document/9841539),[Document](https://dx.doi.org/10.1109/ACCESS.2022.3194145)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p12.1),[§1](https://arxiv.org/html/2605.22859#S1.p4.1),[§4](https://arxiv.org/html/2605.22859#S4.p2.1)\.
- \[3\]J\. Amann, A\. Blasimme, E\. Vayena, D\. Frey, V\. I\. Madai, and the Precise4Q consortium\(2020\-11\)Explainability for artificial intelligence in healthcare: a multidisciplinary perspective\.BMC Medical Informatics and Decision Making20\(1\),pp\. 310\.External Links:ISSN 1472\-6947,[Link](https://doi.org/10.1186/s12911-020-01332-6),[Document](https://dx.doi.org/10.1186/s12911-020-01332-6)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p1.1)\.
- \[4\]P\. Anderer, G\. Gruber, S\. Parapatics, M\. Woertz, T\. Miazhynskaia, G\. Klösch, B\. Saletu, J\. Zeitlhofer, M\. J\. Barbanoj, H\. Danker\-Hopfe, S\. Himanen, B\. Kemp, T\. Penzel, M\. Grözinger, D\. Kunz, P\. Rappelsberger, A\. Schlögl, and G\. Dorffner\(2005\)An E\-Health Solution for Automatic Sleep Classification according to Rechtschaffen and Kales: Validation Study of the Somnolyzer 24 × 7 Utilizing the Siesta Database\.Neuropsychobiology51\(3\),pp\. 115–133\(en\)\.External Links:ISSN 0302\-282X, 1423\-0224,[Link](https://www.karger.com/Article/FullText/85205),[Document](https://dx.doi.org/10.1159/000085205)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1),[§1](https://arxiv.org/html/2605.22859#S1.p8.1)\.
- \[5\]P\. Anderer, A\. Moreau, M\. Woertz, M\. Ross, G\. Gruber, S\. Parapatics, E\. Loretz, E\. Heller, A\. Schmidt, M\. Boeck, D\. Moser, G\. Kloesch, B\. Saletu, G\. M\. Saletu\-Zyhlarz, H\. Danker\-Hopfe, J\. Zeitlhofer, and G\. Dorffner\(2010\)Computer\-assisted sleep classification according to the standard of the American Academy of Sleep Medicine: validation study of the AASM version of the Somnolyzer 24 × 7\.Neuropsychobiology62\(4\),pp\. 250–264\(eng\)\.External Links:ISSN 1423\-0224,[Document](https://dx.doi.org/10.1159/000320864)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1),[§1](https://arxiv.org/html/2605.22859#S1.p8.1)\.
- \[6\]P\. Anderer, M\. Ross, A\. Cerny, R\. Vasko, E\. Shaw, and P\. Fonseca\(2023\)Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing\.Frontiers in Sleep2\.External Links:ISSN 2813\-2890,[Link](https://www.frontiersin.org/articles/10.3389/frsle.2023.1163477)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1)\.
- \[7\]J\. P\. Bakker, M\. Ross, A\. Cerny, R\. Vasko, E\. Shaw, S\. Kuna, U\. J\. Magalang, N\. M\. Punjabi, and P\. Anderer\(2023\-02\)Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto\-scoring\.Sleep46\(2\),pp\. zsac154\.External Links:ISSN 0161\-8105,[Link](https://doi.org/10.1093/sleep/zsac154),[Document](https://dx.doi.org/10.1093/sleep/zsac154)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1)\.
- \[8\]I\. Boira, V\. Esteban, J\. N\. Sancho\-Chust, E\. Pastor, P\. Fernández\-Martínez, A\. Torba, and E\. Chiner\(2025\-06\)Validation of the Somnolyzer 24×7 automatic scoring system in children with suspected obstructive sleep apnea\.Frontiers in Medicine12\(English\)\.External Links:ISSN 2296\-858X,[Link](https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2025.1617530/full),[Document](https://dx.doi.org/10.3389/fmed.2025.1617530)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1)\.
- \[9\]M\. Braun, M\. Stockhoff, M\. Tijssen, S\. Dietz\-Terjung, S\. Coughlin, and C\. Schöbel\(2024\-06\)A Systematic Review on the Technical Feasibility of Home\-Polysomnography for Diagnosis of Sleep Disorders in Adults\.Current Sleep Medicine Reports10\(2\),pp\. 276–288\(en\)\.External Links:ISSN 2198\-6401,[Link](https://doi.org/10.1007/s40675-024-00301-z),[Document](https://dx.doi.org/10.1007/s40675-024-00301-z)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1)\.
- \[10\]O\. Faust, H\. Razaghi, R\. Barika, E\. J\. Ciaccio, and U\. R\. Acharya\(2019\-07\)A review of automated sleep stage scoring based on physiological signals for the new millennia\.Computer Methods and Programs in Biomedicine176,pp\. 81–91\(en\)\.External Links:ISSN 0169\-2607,[Link](https://www.sciencedirect.com/science/article/pii/S0169260718313865),[Document](https://dx.doi.org/10.1016/j.cmpb.2019.04.032)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1)\.
- \[11\]L\. Fiorillo, G\. Monachino, J\. van der Meer, M\. Pesce, J\. D\. Warncke, M\. H\. Schmidt, C\. L\. A\. Bassetti, A\. Tzovara, P\. Favaro, and F\. D\. Faraci\(2023\-03\)U\-Sleep’s resilience to AASM guidelines\.npj Digital Medicine6\(1\),pp\. 1–9\(en\)\.Note:Number: 1External Links:ISSN 2398\-6352,[Link](https://www.nature.com/articles/s41746-023-00784-0),[Document](https://dx.doi.org/10.1038/s41746-023-00784-0)Cited by:[§4](https://arxiv.org/html/2605.22859#S4.p2.1)\.
- \[12\]L\. Fiorillo, A\. Puiatti, M\. Papandrea, P\. Ratti, P\. Favaro, C\. Roth, P\. Bargiotas, C\. L\. Bassetti, and F\. D\. Faraci\(2019\-12\)Automated sleep scoring: A review of the latest approaches\.Sleep Medicine Reviews48,pp\. 101204\(en\)\.External Links:ISSN 1087\-0792,[Link](https://www.sciencedirect.com/science/article/pii/S1087079218301746),[Document](https://dx.doi.org/10.1016/j.smrv.2019.07.007)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1),[§1](https://arxiv.org/html/2605.22859#S1.p7.1),[§4](https://arxiv.org/html/2605.22859#S4.p2.1)\.
- \[13\]J\. Fischer, Z\. Dogas, C\. L\. Bassetti, S\. Berg, L\. Grote, P\. Jennum, P\. Levy, S\. Mihaicuta, L\. Nobili, D\. Riemann, F\. J\. Puertas Cuesta, F\. Raschke, D\. J\. Skene, N\. Stanley, D\. Pevernagie, Executive Committee \(EC\) of the Assembly of the National Sleep Societies \(ANSS\), and Board of the European Sleep Research Society \(ESRS\), Regensburg, Germany\(2012\-08\)Standard procedures for adults in accredited sleep medicine centres in Europe\.Journal of Sleep Research21\(4\),pp\. 357–368\(eng\)\.External Links:ISSN 1365\-2869,[Document](https://dx.doi.org/10.1111/j.1365-2869.2011.00987.x)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1)\.
- \[14\]M\. Gaiduk, Á\. Serrano Alarcón, R\. Seepold, and N\. Martínez Madrid\(2023\-07\)Current status and prospects of automatic sleep stages scoring: Review\.Biomedical Engineering Letters13\(3\),pp\. 247–272\.External Links:ISSN 2093\-9868,[Link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10382458/),[Document](https://dx.doi.org/10.1007/s13534-023-00299-3)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1),[§4](https://arxiv.org/html/2605.22859#S4.p2.1)\.
- \[15\]K\. M\. Gunnarsdottir, C\. Gamaldo, R\. M\. Salas, J\. B\. Ewen, R\. P\. Allen, K\. Hu, and S\. V\. Sarma\(2020\-10\)A novel sleep stage scoring system: Combining expert‐based features with the generalized linear model\.Journal of Sleep Research29\(5\),pp\. e12991\(en\)\.External Links:ISSN 0962\-1105, 1365\-2869,[Link](https://onlinelibrary.wiley.com/doi/10.1111/jsr.12991),[Document](https://dx.doi.org/10.1111/jsr.12991)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1),[§4](https://arxiv.org/html/2605.22859#S4.p2.1)\.
- \[16\]E\. Hardarson, L\. Biedebach, Ó\. B\. Ómarsson, T\. Hrólfsson, A\. S\. Islind, and M\. Óskarsdóttir\(2026\-03\)Data\-Local Autonomous LLM\-Guided Neural Architecture Search for Multiclass Multimodal Time\-Series Classification\.arXiv\.Note:arXiv:2603\.15939 \[cs\]External Links:[Link](http://arxiv.org/abs/2603.15939),[Document](https://dx.doi.org/10.48550/arXiv.2603.15939)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1)\.
- \[17\]E\. Hardarson, F\. Ivarsson, A\. S\. Islind, E\. S\. Arnardóttir, and M\. Óskarsdóttir\(2024\-12\)Human\-AI Collaboration: From Explainable AI to Co\-Creating Meaning\.ACIS 2024 Proceedings\.External Links:[Link](https://aisel.aisnet.org/acis2024/148)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p5.1),[§1](https://arxiv.org/html/2605.22859#S1.p9.1)\.
- \[18\]J\. Hasan\(1996\-07\)Past and Future of Computer\-Assisted Sleep Analysis and Drowsiness Assessment:\.Journal of Clinical Neurophysiology13\(4\),pp\. 295–313\(en\)\.External Links:ISSN 0736\-0258,[Link](http://journals.lww.com/00004691-199607000-00004),[Document](https://dx.doi.org/10.1097/00004691-199607000-00004)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1)\.
- \[19\]A\. Holtzman, J\. Buys, L\. Du, M\. Forbes, and Y\. Choi\(2020\-02\)The Curious Case of Neural Text Degeneration\.arXiv\.Note:arXiv:1904\.09751 \[cs\]External Links:[Link](http://arxiv.org/abs/1904.09751),[Document](https://dx.doi.org/10.48550/arXiv.1904.09751)Cited by:[§3\.1](https://arxiv.org/html/2605.22859#S3.SS1.p4.1)\.
- \[20\]T\. Hulsen\(2023\-09\)Explainable Artificial Intelligence \(XAI\): Concepts and Challenges in Healthcare\.AI4\(3\),pp\. 652–666\(en\)\.Note:Number: 3External Links:ISSN 2673\-2688,[Link](https://www.mdpi.com/2673-2688/4/3/34),[Document](https://dx.doi.org/10.3390/ai4030034)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p1.1),[§1](https://arxiv.org/html/2605.22859#S1.p4.1),[§4](https://arxiv.org/html/2605.22859#S4.p1.1)\.
- \[21\]C\. Iber, S\. Ancoli\-Israel, A\. Chesson, and S\. Quan\(2007\)The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Techinical Specifications, 1st ed\.\.American Academy of Sleep Medicine\.Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1)\.
- \[22\]S\. Labkoff, B\. Oladimeji, J\. Kannry, A\. Solomonides, R\. Leftwich, E\. Koski, A\. L\. Joseph, M\. Lopez\-Gonzalez, L\. A\. Fleisher, K\. Nolen, S\. Dutta, D\. R\. Levy, A\. Price, P\. J\. Barr, J\. D\. Hron, B\. Lin, G\. Srivastava, N\. Pastor, U\. S\. Luque, T\. T\. T\. Bui, R\. Singh, T\. Williams, M\. G\. Weiner, T\. Naumann, D\. F\. Sittig, G\. P\. Jackson, and Y\. Quintana\(2024\-11\)Toward a responsible future: recommendations for AI\-enabled clinical decision support\.Journal of the American Medical Informatics Association31\(11\),pp\. 2730–2739\(en\)\.External Links:ISSN 1067\-5027, 1527\-974X,[Link](https://academic.oup.com/jamia/article/31/11/2730/7776823),[Document](https://dx.doi.org/10.1093/jamia/ocae209)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p1.1)\.
- \[23\]E\. Larson, A\. Gramfort, D\. A\. Engemann, J\. Leppakangas, C\. Brodbeck, M\. Jas, T\. L\. Brooks, J\. Sassenhagen, D\. McCloy, M\. Luessi, J\. King, R\. Höchenberger, C\. Brunner, R\. Goj, G\. Favelier, M\. van Vliet, M\. Wronkiewicz, S\. Appelhoff, A\. Rockhill, C\. Holdgraf, M\. Scheltienne, J\. Massich, Y\. Bekhti, A\. Leggitt, A\. Dykstra, R\. Trachel, R\. Luke, L\. De Santis, A\. Panda, M\. Magnuski, B\. Westner, D\. G\. Wakeman, D\. Strohmeier, H\. Bharadwaj, T\. Linzen, A\. Barachant, E\. Ruzich, S\. Huberty, C\. J\. Bailey, A\. Li, C\. Moutard, L\. Bloy, F\. Raimondo, J\. Nurminen, M\. Billinger, J\. Montoya, M\. Woodman, I\. Lee, M\. Schulz, N\. Foti, C\. Nangini, J\. C\. García Alanis, T\. S\. Binns, D\. P\. Orfanos, O\. Hauk, R\. Maddox, R\. LaPlante, A\. Drew, C\. Dinh, G\. Dumas, Martin, J\. Benerradi, T\. Hartmann, E\. Ort, P\. Pasler, S\. Repplinger, A\. Rudiuk, A\. Radanovic, B\. Buran, J\. Woessner, M\. Massias, M\. Hämäläinen, P\. Sripad, V\. Chirkov, C\. Mullins, F\. Raimundo, G\. Belonosov, M\. Kaneda, P\. Alday, R\. Pari, S\. Kornblith, Y\. Halchenko, Y\. Luo, J\. Kasper, K\. Doelling, M\. Jensen, S\. Ruuskanen, S\. Kern, T\. Gahlot, A\. Nunes, D\. Gütlin, E\. Heinila, K\. Armeni, kjs, A\. Weinstein, C\. Lamus, C\. M\. Galván, C\. Moënne\-Loccoz, D\. Altukhov, E\. Peterson, J\. Hanna, J\. Houck, N\. Klein, P\. Roujansky, R\. Luke, A\. Rantala, B\. Maess, C\. Forster, C\. O’Reilly, D\. Welke, H\. Kolkhorst, H\. Banville, J\. Zhang, K\. Maksymenko, M\. Clarke, M\. Anelli, M\. Straube, N\. Chapochnikov, P\. Bannier, S\. Choudhary, V\. Férat, C\. Kim, F\. Klotzsche, F\. Wong, I\. Kojcic, J\. D\. Nielsen, K\. Lankinen, K\. Tabavi, L\. Thibault, M\. Gerster, N\. Alibou, N\. Gayraud, N\. Ward, Q\. Chu, S\. Herbst, T\. Ma, A\. Quinn, A\. Gauthier, B\. Pinsard, E\. Stephen, E\. Hornberger, E\. Hathaway, E\. Kalenkovich, F\. Mamashli, G\. O’Neill, G\. Marinato, H\. Anevar, H\. Abdelhedi, J\. Sosulski, J\. Stout, J\. Calder\-Travis, J\. D\. Zhu, L\. Eisenman, L\. Esch, M\. Dovgialo, N\. Barascud, N\. Legrand, N\. Kapralov, P\. J\. Molfese, R\. Falach, S\. Deslauriers\-Gauthier, S\. Cotroneo, S\. Matindi, S\. Bierer, T\. Papadopoulo, T\. S\. Binns, T\. Stenner, V\. Peterson, Z\. Baratz, A\. Tonin, A\. Kovrig, A\. Pascarella, A\. Karekal, B\. Aristimunha, C\. de la Torre, C\. Gohil, C\. Zhao, D\. Krzemiński, D\. Makowski, E\. Mikulan, F\. Hofer, H\. Ritz, J\. Schiratti, J\. Evans, J\. Herforth, J\. Veillette, J\. Drew, J\. Teves, K\. Mathewson, L\. Gwilliams, L\. Lementec, L\. Varghese, L\. Hamilton, L\. Gemein, L\. Hecker, Lx37, M\. van Es, M\. Boggess, M\. Eberlein, M\. Žák, M\. Sherif, N\. Kozhemiako, N\. Srinivasan, N\. Wilming, O\. Kozynets, P\. Ablin, P\. Das, Q\. Bertrand, R\. Shoorangiz, R\. Scholz, R\. Hübner, S\. Sommariva, S\. Er, S\. Khan, S\. Datta, T\. Donoghue, T\. Jochmann, T\. Merk, T\. Flak, T\. Dupré la Tour, T\. NessAiver, akshay0724, sviter, A\. Earle\-Richardson, A\. Hindle, A\. Koutsou, A\. Fecker, A\. Wagner, A\. Ciok, A\. Lepauvre, A\. Kiefer, A\. Gilbert, A\. Pradhan, A\. Padee, A\. Dubarry, A\. Collas, A\. N\. Waniek, A\. Singhal, A\. Rokem, A\. Pelzer, A\. Hurst, B\. J\. Jin, B\. Beasley, B\. Nicenboim, C\. Clauss, C\. Mista, C\. Kechris, C\. Li, C\. Braboszcz, D\. C\. Schad, D\. Hasegan, D\. Tse, D\. E\. Sleiter, D\. Haslacher, D\. Sabbagh, D\. Kostas, D\. Petkova, D\. Issagaliyeva, D\. Das, D\. Wetzel, E\. Eich, E\. DuPre, E\. Lau, E\. Olivetti, E\. Zhang, E\. Ferdman, E\. Çelik, E\. Varano, E\. Altamiranda, E\. Brayet, E\. de Montalivet, E\. Goldstein, F\. Negahbani, F\. Zamberlan, F\. Pop, F\. D\. Weber, G\. Tan, G\. Brookshire, Giulio, G\. Reina, H\. Maymandi, H\. A\. Arzoo, H\. Sonntag, H\. Ye, H\. Shin, H\. O\. Elmas, I\. AZZ, I\. Machairas, I\. Zubarev, I\. de Jong, J\. Phelan, J\. Kaczmarzyk, J\. Zerfowski, J\. J\. F\. van den Bosch, J\. Van Der Donckt, J\. van der Meer, J\. Niediek, J\. Koen, J\. J\. Bear, J\. Dammers, J\. G\. N\. Galán, J\. Welzel, K\. Slama, K\. Al\-Amir, K\. Leinweber, L\. Grabot, L\. M\. Andersen, L\. R\. Almeida, L\. S\. Barbosa, L\. Alfine, L\. Hejtmánek, M\. Balatsko, M\. Kitzbichler, M\. Kumar, M\. Kadwani, M\. Sutela, M\. Koculak, M\. Henney, M\. BaBer, M\. Oberg, M\. van Harmelen, M\. Courtemanche, M\. Tucker, M\. Visconti di Oleggio Castello, M\. Dold, M\. Toivonen, M\. Shader, M\. Cespedes, M\. Krause, M\. Rybář, M\. He, M\. Daneshzand, Mojackhak, N\. Fourcaud\-Trocmé, N\. Gensollen, N\. Proulx, N\. Focke, N\. Chalas, N\. Markowitz, O\. Shubi, P\. Mainar, P\. Sundaram, P\. Anders, P\. Silva, P\. Guetschel, Q\. Li, Q\. Barthélemy, R\. Nadkarni, R\. Gatti, R\. Apariciogarcia, R\. Aagaard, R\. Nasri, R\. Koehler, R\. Stargardsky, R\. Oostenveld, R\. Seymour, R\. T\. Schirrmeister, R\. Jin, R\. E\. Wieske, R\. Law, S\. Pai, S\. Perry, S\. Louviot, S\. Martínez, S\. Saha, S\. Mathot, S\. Jentschke, S\. Major, S\. Treguer, S\. Castaño, S\. Deng, S\. Antopolskiy, S\. \(\. Shirazi, S\. Keshari, S\. Baral, S\. Wong, S\. M\. Hofmann, S\. Poil, S\. Foslien, S\. Singh, S\. Chambon, S\. H\. Magnússon, S\. Bethard, S\. M\. Gutstein, S\. M\. Meyer, T\. Wang, T\. Jayawardana, T\. Moreau, T\. Radman, T\. Gates, T\. Stone, T\. Clausner, T\. E\. Anijärv, V\. P\. Kumaravel, W\. Xu, W\. Turner, X\. d\. Zuazo, X\. Xia, Y\. Zuo, Y\. Shen, Y\. Truong, Z\. Zhang, Z\. ZENG, btkcodedev, buildqa, luzpaz, and user27182\(2025\-11\)MNE\-Python\.Zenodo\.External Links:[Link](https://zenodo.org/doi/10.5281/zenodo.592483),[Document](https://dx.doi.org/10.5281/ZENODO.592483)Cited by:[§2\.1](https://arxiv.org/html/2605.22859#S2.SS1.p2.1)\.
- \[24\]Y\. J\. Lee, J\. Y\. Lee, J\. H\. Cho, and J\. H\. Choi\(2022\-01\)Interrater reliability of sleep stage scoring: a meta\-analysis\.Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine18\(1\),pp\. 193–202\.External Links:ISSN 1550\-9389,[Link](https://pmc.ncbi.nlm.nih.gov/articles/PMC8807917/),[Document](https://dx.doi.org/10.5664/jcsm.9538)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1)\.
- \[25\]S\. Liang, C\. Kuo, Y\. Hu, and Y\. Cheng\(2012\-03\)A rule\-based automatic sleep staging method\.Journal of Neuroscience Methods205\(1\),pp\. 169–176\.External Links:ISSN 0165\-0270,[Link](https://www.sciencedirect.com/science/article/pii/S016502701100759X),[Document](https://dx.doi.org/10.1016/j.jneumeth.2011.12.022)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1),[§1](https://arxiv.org/html/2605.22859#S1.p8.1),[§4](https://arxiv.org/html/2605.22859#S4.p2.1)\.
- \[26\]A\. Malhotra, M\. Younes, S\. T\. Kuna, R\. Benca, C\. A\. Kushida, J\. Walsh, A\. Hanlon, B\. Staley, A\. I\. Pack, and G\. W\. Pien\(2013\-04\)Performance of an automated polysomnography scoring system versus computer\-assisted manual scoring\.Sleep36\(4\),pp\. 573–582\(eng\)\.External Links:ISSN 1550\-9109,[Document](https://dx.doi.org/10.5665/sleep.2548)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1)\.
- \[27\]S\. Nikkonen, P\. Somaskandhan, H\. Korkalainen, S\. Kainulainen, P\. I\. Terrill, H\. Gretarsdottir, S\. Sigurdardottir, K\. A\. Olafsdottir, A\. S\. Islind, M\. Óskarsdóttir, E\. S\. Arnardóttir, and T\. Leppänen\(2024\)Multicentre sleep\-stage scoring agreement in the Sleep Revolution project\.Journal of Sleep Research33\(1\),pp\. e13956\(en\)\.Note:\_eprint: https://onlinelibrary\.wiley\.com/doi/pdf/10\.1111/jsr\.13956External Links:ISSN 1365\-2869,[Link](https://onlinelibrary.wiley.com/doi/abs/10.1111/jsr.13956),[Document](https://dx.doi.org/10.1111/jsr.13956)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1),[§2\.3](https://arxiv.org/html/2605.22859#S2.SS3.p1.1),[§2\.3](https://arxiv.org/html/2605.22859#S2.SS3.p2.4),[§2\.4](https://arxiv.org/html/2605.22859#S2.SS4.p1.1)\.
- \[28\]T\. Penzel and R\. Conradt\(2000\-04\)Computer based sleep recording and analysis\.Sleep Medicine Reviews4\(2\),pp\. 131–148\(en\)\.External Links:ISSN 10870792,[Link](https://linkinghub.elsevier.com/retrieve/pii/S1087079299900874),[Document](https://dx.doi.org/10.1053/smrv.1999.0087)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1),[§1](https://arxiv.org/html/2605.22859#S1.p8.1)\.
- \[29\]M\. Perslev, S\. Darkner, L\. Kempfner, M\. Nikolic, P\. J\. Jennum, and C\. Igel\(2021\-04\)U\-Sleep: resilient high\-frequency sleep staging\.npj Digital Medicine4\(1\),pp\. 72\(en\)\.External Links:ISSN 2398\-6352,[Link](https://www.nature.com/articles/s41746-021-00440-5),[Document](https://dx.doi.org/10.1038/s41746-021-00440-5)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1)\.
- \[30\]H\. Phan, K\. P\. Lorenzen, E\. Heremans, O\. Y\. Chén, M\. C\. Tran, P\. Koch, A\. Mertins, M\. Baumert, K\. B\. Mikkelsen, and M\. De Vos\(2023\-10\)L\-SeqSleepNet: Whole\-cycle Long Sequence Modeling for Automatic Sleep Staging\.IEEE Journal of Biomedical and Health Informatics27\(10\),pp\. 4748–4757\.External Links:ISSN 2168\-2208,[Link](https://ieeexplore.ieee.org/document/10210638),[Document](https://dx.doi.org/10.1109/JBHI.2023.3303197)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1),[§4](https://arxiv.org/html/2605.22859#S4.p2.1)\.
- \[31\]A\. Rechtschaffen and A\. Kales\(1968\)A manual of standardized terminology, techniques and scoring system of sleep stages in human subjects\.University of California, Brain Information Service/Brain Research Institute,Los Angeles\.Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p7.1)\.
- \[32\]R\. S\. Rosenberg and S\. Van Hout\(2013\-01\)The American Academy of Sleep Medicine inter\-scorer reliability program: sleep stage scoring\.Journal of clinical sleep medicine: JCSM: official publication of the American Academy of Sleep Medicine9\(1\),pp\. 81–87\(eng\)\.External Links:ISSN 1550\-9397,[Document](https://dx.doi.org/10.5664/jcsm.2350)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p3.1)\.
- \[33\]C\. Rudin\(2019\-09\)Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead\.arXiv\.Note:arXiv:1811\.10154 \[cs, stat\]External Links:[Link](http://arxiv.org/abs/1811.10154),[Document](https://dx.doi.org/10.48550/arXiv.1811.10154)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p1.1)\.
- \[34\]N\. Stanley\(2023\-05\)The Future of Sleep Staging, Revisited\.Nature and Science of Sleep15,pp\. 313–322\(English\)\.External Links:[Document](https://dx.doi.org/10.2147/NSS.S405663)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1),[§1](https://arxiv.org/html/2605.22859#S1.p5.1),[§4](https://arxiv.org/html/2605.22859#S4.p1.1),[§4](https://arxiv.org/html/2605.22859#S4.p5.1)\.
- \[35\]A\. Supratak, H\. Dong, C\. Wu, and Y\. Guo\(2017\-11\)DeepSleepNet: a Model for Automatic Sleep Stage Scoring based on Raw Single\-Channel EEG\.IEEE Transactions on Neural Systems and Rehabilitation Engineering25\(11\),pp\. 1998–2008\.Note:arXiv:1703\.04046 \[stat\]External Links:ISSN 1534\-4320, 1558\-0210,[Link](http://arxiv.org/abs/1703.04046),[Document](https://dx.doi.org/10.1109/TNSRE.2017.2721116)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1)\.
- \[36\]M\. M\. Troester, S\. F\. Quan, A\. A\. o\. S\. Medicine, and R\. B\. Berry\(2023\-06\)The AASM Manual for the Scoring of Sleep and Associated Events, Version 3\.American Academy Of Sleep Medicine\(en\)\.External Links:ISBN 978\-0\-9706137\-1\-4Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.22859#S2.SS1.p2.1),[§2\.1](https://arxiv.org/html/2605.22859#S2.SS1.p3.1),[§4](https://arxiv.org/html/2605.22859#S4.p1.1),[§4](https://arxiv.org/html/2605.22859#S4.p5.1)\.
- \[37\]R\. Vallat and M\. P\. Walker\(2021\-10\)An open\-source, high\-performance tool for automated sleep staging\.eLife10,pp\. e70092\.External Links:ISSN 2050\-084X,[Link](https://doi.org/10.7554/eLife.70092),[Document](https://dx.doi.org/10.7554/eLife.70092)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1),[§2\.1](https://arxiv.org/html/2605.22859#S2.SS1.p3.1)\.
- \[38\]P\. Welch\(1967\-06\)The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms\.IEEE Transactions on Audio and Electroacoustics15\(2\),pp\. 70–73\.External Links:ISSN 1558\-2582,[Link](https://ieeexplore.ieee.org/document/1161901),[Document](https://dx.doi.org/10.1109/TAU.1967.1161901)Cited by:[§2\.1](https://arxiv.org/html/2605.22859#S2.SS1.p2.1),[§2\.1](https://arxiv.org/html/2605.22859#S2.SS1.p3.1)\.
- \[39\]M\. Yazdi, M\. Samaee, and D\. Massicotte\(2024\-06\)A Review on Automated Sleep Study\.Annals of Biomedical Engineering52\(6\),pp\. 1463–1491\(en\)\.External Links:ISSN 1573\-9686,[Link](https://doi.org/10.1007/s10439-024-03486-0),[Document](https://dx.doi.org/10.1007/s10439-024-03486-0)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p12.1)\.
- \[40\]B\. Zhang, T\. Lei, H\. Liu, and H\. Cai\(2018\-09\)EEG\-Based Automatic Sleep Staging Using Ontology and Weighting Feature Analysis\.Computational and Mathematical Methods in Medicine2018,pp\. 1–16\(en\)\.External Links:ISSN 1748\-670X, 1748\-6718,[Link](https://www.hindawi.com/journals/cmmm/2018/6534041/),[Document](https://dx.doi.org/10.1155/2018/6534041)Cited by:[§1](https://arxiv.org/html/2605.22859#S1.p4.1)\.Similar Articles
A Conflict-aware Evidential Framework for Reliable Sleep Stage Classification
ConfSleepNet is a conflict-aware evidential framework for reliable sleep stage classification using multi-modal data. It introduces hybrid category structures and a conflict-aware aggregation method to resolve inter-view conflicts, demonstrating effectiveness on sleep staging tasks.
STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification
This paper introduces STDA-Net, a domain adaptation framework for cross-dataset sleep stage classification using 2D spectrograms and adversarial learning. It demonstrates improved accuracy and stability over existing 1D EEG baseline methods on public datasets.
Efficient Temporal Modeling for Mobile Sleep Staging via Lightweight Random Attention
Introduces Random Attention (RA), a lightweight temporal modeling module for mobile sleep staging that uses fixed random projections for similarity-based aggregation, achieving competitive performance with minimal additional parameters.
Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection
This paper introduces Score-Guided Classification (SGC), a framework that models pathological priors using an unsupervised generative network for EEG-based depression detection, avoiding synthetic data augmentation and improving classification accuracy.
Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
This paper investigates the latent structure of multimodal embeddings from a masked autoencoder for pediatric sleep analysis. It shows that augmenting embeddings with geometric, topological, and clinical features improves prediction and calibration for sleep-related events.