A Conflict-aware Evidential Framework for Reliable Sleep Stage Classification

arXiv cs.AI Papers

Summary

ConfSleepNet is a conflict-aware evidential framework for reliable sleep stage classification using multi-modal data. It introduces hybrid category structures and a conflict-aware aggregation method to resolve inter-view conflicts, demonstrating effectiveness on sleep staging tasks.

arXiv:2605.17021v1 Announce Type: new Abstract: Multi-view learning has been widely applied for sleep stage classification using multi-modal data. However, existing methods typically assume that different modalities are well-aligned, which is often unattainable in real-world scenarios, thereby compromising the reliability of the staging results. In this paper, we propose ConfSleepNet, a conflict-aware evidential framework that dynamically resolves inter-view conflicts. The framework consists of multi-view evidence extraction and conflict-aware aggregation. In the first phase, it learns category-related evidence from different modalities, which represents the degree of support for individual sleep stages. Considering the inherent characteristics of varying modalities, we propose hybrid category structures for different modalities to promote more reasonable evidence learning. In the second phase, view-specific opinions, including prediction results and uncertainty, are constructed from the learned evidence. Notably, we propose a novel conflict-aware aggregation method that integrates these view-specific opinions into a reliable joint decision. This mechanism can effectively resolve conflicts among opinions and synthesize them into a reliable joint decision. Both theoretical analysis and experimental results demonstrate the effectiveness of ConfSleepNet in sleep staging tasks. The code is available at https://github.com/By4te/ConfSleepNet_ICML2026/.
Original Article
View Cached Full Text

Cached at: 05/19/26, 06:38 AM

# A Conflict-aware Evidential Framework for Reliable Sleep Stage Classification
Source: [https://arxiv.org/html/2605.17021](https://arxiv.org/html/2605.17021)
###### Abstract

Multi\-view learning has been widely applied for sleep stage classification using multi\-modal data\. However, existing methods typically assume that different modalities are well\-aligned, which is often unattainable in real\-world scenarios, thereby compromising the reliability of the staging results\. In this paper, we propose ConfSleepNet, a conflict\-aware evidential framework that dynamically resolves inter\-view conflicts\. The framework consists of multi\-view evidence extraction and conflict\-aware aggregation\. In the first phase, it learns category\-related evidence from different modalities, which represents the degree of support for individual sleep stages\. Considering the inherent characteristics of varying modalities, we propose hybrid category structures for different modalities to promote more reasonable evidence learning\. In the second phase, view\-specific opinions, including prediction results and uncertainty, are constructed from the learned evidence\. Notably, we propose a novel conflict\-aware aggregation method that integrates these view\-specific opinions into a reliable joint decision\. This mechanism can effectively resolve conflicts among opinions and synthesize them into a reliable joint decision\. Both theoretical analysis and experimental results demonstrate the effectiveness of ConfSleepNet in sleep staging tasks\. The code is available at[ConfSleepNet\.](https://github.com/By4te/ConfSleepNet_ICML2026/)

Sleep staging, Sleep health, Deep learning, Multi\-view learning, Conflict\-aware fusion\.

## 1Introduction

Sleep occupies approximately one\-third of a human’s lifetime and is crucial for maintaining mental and physical well\-being\(Laneet al\.,[2023](https://arxiv.org/html/2605.17021#bib.bib24)\)\. Unfortunately, an increasing number of people suffer from sleep disorders, posing significant public health challenges\(Perez\-Pozueloet al\.,[2020](https://arxiv.org/html/2605.17021#bib.bib22); Grandner,[2022](https://arxiv.org/html/2605.17021#bib.bib23)\)\. For example, about 36% of the global population and 176 million Chinese experience sleep disorders, leading to health issues such as cardiovascular diseases, cognitive decline, and memory deterioration\. Clinically, sleep staging is a fundamental process for human sleep assessment\(Konget al\.,[2023](https://arxiv.org/html/2605.17021#bib.bib20)\)\. Traditionally, this task is performed manually by sleep experts based on overnight polysomnography \(PSG\), a process that typically takes several hours\(Portieret al\.,[2000](https://arxiv.org/html/2605.17021#bib.bib19)\)\. In contrast, machine\-assisted classification models can accomplish this task within seconds\. Therefore, automating the sleep scoring process is imperative\.

Previous work on automatic sleep staging can be divided into single\-view methods\(Suprataket al\.,[2017](https://arxiv.org/html/2605.17021#bib.bib73); Liet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib14); Phyoet al\.,[2023](https://arxiv.org/html/2605.17021#bib.bib72)\)and multi\-view methods\(Phanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib61); Chenet al\.,[2023](https://arxiv.org/html/2605.17021#bib.bib88); Pradeepkumaret al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib12)\)based on the type of network input\. Single\-view methods typically use the single\-modal information provided by electroencephalogram \(EEG\) for sleep staging\. In contrast, some studies have explored the introduction of additional modalities, such as electrooculogram \(EOG\), to provide complementary view information for the classification model\. Notably, existing methods often assume that different views are well\-aligned, assigning equal weights to different views \(e\.g\., via concatenation\) or learning a fixed weight for each view\(Phanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib61); Pradeepkumaret al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib12)\)\. However, this assumption is not always valid in real\-world applications, as information from different views may conflict \(e\.g\., two views may point to different sleep stages\)\. Therefore, a well\-designed model must be aware of such conflicts and dynamically adjust the importance of each view during decision\-making\.

In this work, we propose ConfSleepNet, a conflict\-aware evidential framework that effectively manages inter\-view conflicts and promotes reliable sleep staging decisions\. Specifically, ConfSleepNet consists of two main phases: multi\-view evidence extraction and conflict\-aware aggregation\. First, view\-specific evidential deep neural networks \(DNNs\) are employed to learn category\-related evidence from multi\-modal inputs, including EEG, EOG, and their combination\. Notably, unlike existing methods that treat all modalities indiscriminately using a five\-class classification design, we adopt a hybrid category structure combining coarse\-grained and fine\-grained categories\. This enables the model to extract evidence from each view in a reasonable and physiologically aligned manner\. In the multi\-view aggregation phase, we construct view\-specific opinions that incorporate class belief masses and prediction uncertainties using Dirichlet distributions parameterized by the learned evidence\. For final decision\-making, we propose a novel conflict\-aware multi\-view aggregation method that explicitly accounts for inter\-view conflicts, thereby integrating multiple view\-specific opinions into a reliable joint opinion\.

Our main contributions are summarized as follows: \(1\) We propose an evidential framework named ConfSleepNet, which introduces a hybrid category structure to enable differentiated evidence learning from EEG and EOG signals\. \(2\) We present a conflict\-aware multi\-view aggregation method that enhances the reliability of classification results by explicitly considering inter\-view conflicts\. The theoretical analysis in the Section[3\.5](https://arxiv.org/html/2605.17021#S3.SS5)demonstrates that this method can integrate multiple potentially conflicting opinions into a reasonable joint opinion\. \(3\) The proposed method is evaluated on multiple public datasets, and the results show that it outperforms state\-of\-the\-art baseline methods\.

## Conflict of Interest Disclosure

The authors declare that they have no financial conflicts of interest related to this work\.

## 2Related Work

### 2\.1Automatic Multi\-View Sleep Staging

Leveraging multi\-view data for learning can provide richer information compared to single\-view data, and its effectiveness in sleep staging tasks has been well demonstrated\. Existing methods typically learn features independently from each view, followed by feature\-level fusion\. For example,Chambonet al\.\([2018](https://arxiv.org/html/2605.17021#bib.bib77)\)learn high\-level representations from PSG signals and subsequently construct a joint representation for classification\. Similarly,Phanet al\.\([2018](https://arxiv.org/html/2605.17021#bib.bib78)\)convert raw signals into time\-frequency images and then perform feature\-level fusion for prediction\. These works adopt simple operations such as concatenation for multi\-view fusion, implicitly assuming that all views are equally important\. However, this assumption does not always hold, raising concerns about model reliability\. For this issue,Jiaet al\.\([2021](https://arxiv.org/html/2605.17021#bib.bib85)\)employ a dual\-stream network to extract features independently from EEG and EOG signals, and leverage an attention mechanism for feature fusion\. Furthermore,Daiet al\.\([2023](https://arxiv.org/html/2605.17021#bib.bib63)\)adopt a Transformer encoder\(Vaswani,[2017](https://arxiv.org/html/2605.17021#bib.bib64)\)for view\-specific feature extraction and multi\-view feature fusion based on a self\-attention mechanism\. Although such methods can appropriately fuse multiple views, they remain unable to detect potential noise within each view, thereby compromising the robustness of the final predictions\.

### 2\.2Conflicting Multi\-View Learning

Early multi\-view learning works relied on Bayesian methods\(Neal,[2012](https://arxiv.org/html/2605.17021#bib.bib47); Gal and Ghahramani,[2016](https://arxiv.org/html/2605.17021#bib.bib49)\)to construct weight distributions but were limited by high computational costs\. Subsequently, ensemble methods\(Egeleet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib42); Ganaieet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib41)\)advanced the field by combining predictions from multiple independent sub\-networks\. Despite making progress, these approaches overlooked inter\-view conflicts\. In recent years, evidential deep learning \(EDL\)\(Sensoyet al\.,[2018](https://arxiv.org/html/2605.17021#bib.bib58)\)has achieved significant success in multi\-view learning tasks\(Liet al\.,[2023](https://arxiv.org/html/2605.17021#bib.bib30); Xiaet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib29)\)\. Within the EDL framework, related works\(Hanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib54); Shaoet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib3); Huanget al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib1)\)have employed Dempster\-Shafer theory\(Dempster,[1968](https://arxiv.org/html/2605.17021#bib.bib31)\)to assign lower weights to views with high uncertainty, thereby addressing conflicts among views\. However, existing EDL\-based methods suffer from a limitation in handling the influence of conflicts on prediction uncertainty: they implicitly assume that incorporating more certain opinions will always reduce overall uncertainty\(Liuet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib27); Zhanget al\.,[2023](https://arxiv.org/html/2605.17021#bib.bib25); Xuet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib108)\)\. We consider this assumption unreasonable and consequently propose a conflict\-aware multi\-view aggregation method, along with a theoretical analysis of its advantages\.

![Refer to caption](https://arxiv.org/html/2605.17021v1/x1.png)Figure 1:Illustration of ConfSleepNet\. Four evidential DNNs\{fv​\(⋅\)\}v=14\\\{f\_\{v\}\(\\cdot\)\\\}\_\{v=1\}^\{4\}learn class\-specific evidences from various views, which involve two different category structures\. Next, an evidence mapping layer maps the coarse\-grained evidence into fine\-grained evidence\. After that, we construct view\-specific opinions\{ℳv\}v=14\\\{\\mathcal\{M\}^\{v\}\\\}\_\{v=1\}^\{4\}based on the obtained evidence and then combine them to form a joint opinionℳ\\mathcal\{M\}\. In multi\-view fusion, we identify conflictive opinions via uncertainty estimation and accordingly reduce their impact in decision\-making\.

## 3The Proposed Method

The proposed ConfSleepNet handles a sequence of 30\-s sleep epochsx\(L\)=\{x1,x2,…,xL\}\\textbf\{x\}^\{\(L\)\}=\\\{x\_\{1\},x\_\{2\},\.\.\.,x\_\{L\}\\\}and classifies each epochxix\_\{i\}into sleep stageyi∈\{W,N​1,N​2,N​3,R​E​M\}y\_\{i\}\\in\\\{W,N1,N2,N3,REM\\\}following the AASM rule\(Berryet al\.,[2012](https://arxiv.org/html/2605.17021#bib.bib59)\)\. Herein,xi∈ℝ2×Tx\_\{i\}\\in\\mathbb\{R\}^\{2\\times T\}contains EEG and EOG signals with equal sampling frequency, andTTis the number of points in an epoch\. The overall architecture of ConfSleepNet is illustrated in Fig\.[1](https://arxiv.org/html/2605.17021#S2.F1), with further detailed discussions to be presented in the following subsections\.

### 3\.1Design Principles

EEG and EOG are the two most widely used physiological signals in clinical sleep staging applications\. For these two different modalities, their signal characteristics are both distinctive and physiologically complementary\. Therefore, their predictions for the same sample may be either consistent or conflicting\. We argue that a sleep staging model should possess good robustness, enabling reliable final decision\-making even when predictions from different views are in conflict\. To achieve this goal, the following design principles aim to incorporate both complementarity \(i\.e\., EEG and EOG contain complementary information\) and reliability \(i\.e\., reliable multi\-view aggregation\) into the model\.

Principle 1\(Complementarity\): The complementary nature of EEG and EOG is beneficial for sleep stage classification\.According to the AASM standard, PSG signals are typically segmented into 30\-second epochs and classified into one of five sleep stages: wake \(WW\), rapid eye movement \(R​E​MREM\), and three non\-REM \(N​R​E​MNREM\) stages \(N​1N1,N​2N2, andN​3N3\)\. Previous works have often treated the two modalities \(i\.e\., EEG and EOG\) indiscriminately by directly performing five\-class classification on them\. However, in clinical practice, EEG serves as the primary modality for sleep staging, providing discriminative neurophysiological patterns for different sleep states\. Meanwhile, EOG, as an important auxiliary modality to EEG\-based sleep staging, plays a key role primarily in identifying theR​E​MREMstage\. We provide a detailed description of the signal characteristics of EEG and EOG in Appendix[A\.1](https://arxiv.org/html/2605.17021#A1.SS1)\. Based on the signal characteristics of the different modalities, ConfSleepNet designs a coarse\-grained classification structure for EOG \(i\.e\., distinguishingWW,R​E​MREM, andN​R​E​MNREM\) and, in parallel, a fine\-grained classification structure for EEG \(i\.e\., further subdividingN​R​E​MNREMinto three sub\-stages:N​1N1,N​2N2, andN​3N3\)\. Through this design, ConfSleepNet is able to fully leverage the complementary characteristics of EEG and EOG, thereby improving model performance\.

Principle 2\(Consistent multi\-view aggregation\): Incorporating additional consistent view opinions reduces overall uncertainty \(i\.e\., resulting in a more confident opinion\)\.How to reasonably integrate multiple views remains an open problem\. Recently,Xuet al\.\([2024](https://arxiv.org/html/2605.17021#bib.bib108)\)proposed ECML, which represents the current state\-of\-the\-art\. The core idea of this method is that integrating an uncertain view into the original view increases the overall uncertainty\. For example, consider two views on the same instance, producing opinionsℳ1\\mathcal\{M\}^\{1\}andℳ2\\mathcal\{M\}^\{2\}respectively\. Both opinions yield the same prediction but have different levels of uncertainty regarding their own predictions \(u1=0\.3u^\{1\}=0\.3andu2=0\.8u^\{2\}=0\.8\)\. When aggregatingℳ1\\mathcal\{M\}^\{1\}andℳ2\\mathcal\{M\}^\{2\}using ECML, the resulting joint opinion has an uncertainty of 0\.436, which is higher thanu1u^\{1\}\. We consider this unreasonable because the aggregation result of ECML does not account for the consistency between opinions\. Intuitively, incorporating additional consistent views should reduce the overall uncertainty, thereby leading to a more confident final decision\.

Principle 3\(Conflictive multi\-view aggregation\): Incorporating a conflictive view opinion with higher uncertainty increases the overall uncertainty \(i\.e\., resulting in a less confident opinion\)\.In sleep staging, the EEG and EOG views may produce inconsistent opinions for the same sample, which typically arises from factors such as differences in signal characteristics, noise interference, or physiological variability\. Therefore, the proposed ConfSleepNet should be robust against conflicting views, which is crucial for clinical adoption\. However, most existing methods lack appropriate mechanisms to handle conflicting views, thereby compromising model performance\. We argue that incorporating a conflicting opinion with high uncertainty weakens the confidence in the original opinion\. Consequently, the proposed multi\-view aggregation method should explicitly assess the degree of inter\-view conflict while reasonably incorporating the impact of conflict on the final decision\.

### 3\.2View\-Specific Evidence Learning

ConfSleepNet first transforms the raw inputxxinto high\-level feature representations\. Specifically, we utilize two parallel convolutional branches with different kernel sizes to extract time\-frequency features\. Subsequently, an improved GCNet\-1D attention module\(Caoet al\.,[2019](https://arxiv.org/html/2605.17021#bib.bib57)\)is employed to enhance the features, denoted ashlh^\{l\}andhsh^\{s\}\. Furthermore, ConfSleepNet applies a cross\-attention mechanism to learn the interactive information betweenhlh^\{l\}andhsh^\{s\}, yielding the interacted featureshs​lh^\{sl\}andhl​sh^\{ls\}\. Notably, we introduce a bidirectional long short\-term memory layer in this process to model the stage transition patterns across multiple consecutive sleep epochs\. Finally, we combine the enhanced features with the interactive features to obtain the high\-level feature representationFF\. See the Appendix[A\.2](https://arxiv.org/html/2605.17021#A1.SS2)for the detailed feature extraction process\.

Most existing deep learning methods typically perform classification based on the point\-estimated distributions provided bySoftmax\. However, such approaches can still produce overconfident outputs even when confronted with low\-quality data, thus exhibiting limitations in conflictive multi\-view learning\. Evidential deep learning \(EDL\) addresses this challenge by introducing the evidence framework of subjective logic\(Jsang,[2018](https://arxiv.org/html/2605.17021#bib.bib53)\)\. Based on EDL, we constructVVevidential DNNs\{fv​\(⋅\)\}v=1V\\\{f\_\{v\}\(\\cdot\)\\\}\_\{v=1\}^\{V\}to extract evidence from the high\-level featuresFFas follows:

ev=fv​\(⋅\)\\textbf\{e\}^\{v\}=f\_\{v\}\(\\cdot\)\(1\)Unlike most deep learning methods, we use theSoftPlusactivation function to extract an evidence vectorev\\textbf\{e\}^\{v\}containingkkelements from the features \(where each element represents the model’s degree of support for each category\)\. Notably,V=4V=4corresponds to the four designed evidential DNNs, which align with the hybrid category structure in ConfSleepNet\. Specifically, the network inputs consist of three forms: single\-view EEG, single\-view EOG, and multi\-view \(i\.e\., a combination of EEG and EOG\)\.f1​\(⋅\)f\_\{1\}\(\\cdot\)andf2​\(⋅\)f\_\{2\}\(\\cdot\)provide fine\-grained five\-category evidence for the single\-view EEG and multi\-view inputs, respectively\. Meanwhile,f3​\(⋅\)f\_\{3\}\(\\cdot\)andf4​\(⋅\)f\_\{4\}\(\\cdot\)provide coarse\-grained three\-category \(i\.e\.,WW,R​E​MREM, andN​R​E​MNREM\) evidence for the single\-view EOG and multi\-view inputs, respectively\. Based on the input type, we divide the evidential DNNs into single\-view DNNs \(corresponding tof1​\(⋅\)f\_\{1\}\(\\cdot\)andf4​\(⋅\)f\_\{4\}\(\\cdot\)\) and multi\-view DNNs \(corresponding tof2​\(⋅\)f\_\{2\}\(\\cdot\)andf3​\(⋅\)f\_\{3\}\(\\cdot\)\)\. The main components of each network are consistent, and we provide more details on the network architecture in the Appendix[A\.3](https://arxiv.org/html/2605.17021#A1.SS3)\.

### 3\.3Evidence Mapping Layer

The evidences\{ev\}v=14\\\{\\textbf\{e\}^\{v\}\\\}\_\{v=1\}^\{4\}learned by the view\-specific DNNs\{fv​\(⋅\)\}v=14\\\{f\_\{v\}\(\\cdot\)\\\}\_\{v=1\}^\{4\}have category structures with different granularities \(i\.e\.,e1\\textbf\{e\}^\{1\}ande2\\textbf\{e\}^\{2\}are five\-class evidence vectors, whilee3\\textbf\{e\}^\{3\}ande4\\textbf\{e\}^\{4\}are three\-class evidence vectors\)\. Before multi\-view fusion, we design an evidence mapping layer to convert the coarse\-grained evidence vectorse3\\textbf\{e\}^\{3\}ande4\\textbf\{e\}^\{4\}into fine\-grained evidence vectorse~3\\tilde\{\\textbf\{e\}\}^\{3\}ande~4\\tilde\{\\textbf\{e\}\}^\{4\}, respectively\. Specifically,, the evidence mapping function is achieved bye~v\\tilde\{\\textbf\{e\}\}^\{v\}=ev×\\timesU, whereU∈ℝ3×5\\in\\mathbb\{R\}^\{3\\times 5\}is a matrix and expected to be

U=\[100000131313000001\]\\textbf\{U\}=\\begin\{bmatrix\}1&0&0&0&0\\\\ \\\\ 0&\\frac\{1\}\{3\}&\\frac\{1\}\{3\}&\\frac\{1\}\{3\}&0\\\\ \\\\ 0&0&0&0&1\\end\{bmatrix\}The idea of evidence mapping can be summarized as three points: \(1\) The total amount of evidence should remain unchanged after the evidence mapping\. Hence, each row of matrixUshould sum up to 1\. \(2\) Equivalent mapping of evidence for each class should be guaranteed\. This mainly enables two things: First, equivalent evidence mapping for classWWandR​E​MREM; second, the total evidences forN​1N1,N​2N2andN​3N3ine~v\\tilde\{\\textbf\{e\}\}^\{v\}\(v=3,4\)\(v=3,4\)should equal that ofN​R​E​MNREMinev\. \(3\) The evidence forN​R​E​MNREMstage should be equally assigned to its sub\-stages,N​1N1,N​2N2andN​3N3\. This is because, while learning the evidence forN​R​E​MNREM, instances of classN​1N1,N​2N2, andN​3N3have the same distribution\. We compare the performance of this mapping strategy with other strategies in the Appendix[A\.4](https://arxiv.org/html/2605.17021#A1.SS4)and discuss its feasibility\.

### 3\.4Conflict\-Aware Multi\-View Fusion

Constructing View\-Specific Opinions\.After evidence extraction, we then model the distributions of class probabilities by using the Dirichlet distribution\. Specifically, the Dirichlet distribution of thevt​hv^\{th\}view is parameterized by𝜶v=\(α1v,…,α5v\)\\boldsymbol\{\\alpha\}^\{v\}=\(\\alpha\_\{1\}^\{v\},\.\.\.,\\alpha\_\{5\}^\{v\}\), where𝜶v=ev\+1\\boldsymbol\{\\alpha\}^\{v\}=\\textbf\{e\}^\{v\}\+1forv=1,2v=1,2and𝜶v=e~v\+1\\boldsymbol\{\\alpha\}^\{v\}=\\tilde\{\\textbf\{e\}\}^\{v\}\+1forv=3,4v=3,4\. This ensures that the Dirichlet distribution is nonsparse\. Specifically, for thevt​hv^\{th\}view, the probability density function of the Dirichlet distribution is defined as:

D​\(𝒑v\|𝜶v\)=\{1B​\(𝜶v\)​∏k=15\(pkv\)αkv−1f​o​r​𝒑v∈T50o​t​h​e​r​w​i​s​e\\small D\(\\boldsymbol\{p\}^\{v\}\|\\boldsymbol\{\\alpha\}^\{v\}\)=\\left\\\{\\begin\{array\}\[\]\{ccl\}\\frac\{1\}\{B\(\\boldsymbol\{\\alpha\}^\{v\}\)\}\\prod\_\{k=1\}^\{5\}\(\{p\_\{k\}^\{v\}\}\)^\{\\alpha\_\{k\}^\{v\}\-1\}&\{for~\\boldsymbol\{p\}^\{v\}\\in T\_\{5\}\}\\\\ 0&\{otherwise\}\\end\{array\}\\right\.\(2\)whereB​\(⋅\)B\(\\cdot\)denotes the multinomial beta function, andT5T\_\{5\}is the 5\-dimensional unit simplex with the standard and unique definition:

T5=\{𝒑v\|∑k=15pkv=1​a​n​d​0≤p1v,…,p5v≤1\}T\_\{5\}=\\left\\\{\\boldsymbol\{p\}^\{v\}\|\\sum\_\{k=1\}^\{5\}p\_\{k\}^\{v\}=1~and~0\\leq p\_\{1\}^\{v\},~\.\.\.,~p\_\{5\}^\{v\}\\leq 1~\\right\\\}\(3\)Unlike existingSoftmax\-based methods, which only capture first\-order uncertainty, the Dirichlet distribution models higher\-order category probabilities, enabling more precise uncertainty estimation\.

From the Dirichlet distributions, we can further construct view\-specific opinions\. The opinion stemmed from thevt​hv^\{th\}view can be described as an ordered tupleℳv\\mathcal\{M\}^\{v\}=\(𝒃v,uv\)\(\\boldsymbol\{b\}^\{v\},u^\{v\}\), where𝒃v=\(b1v,…,b5v\)\\boldsymbol\{b\}^\{v\}=\(b\_\{1\}^\{v\},\.\.\.,b\_\{5\}^\{v\}\)assigns belief mass to potential sleep stages, and uncertainty massuvu^\{v\}captures the information of ambiguity or vacuity according to the acquired evidences\. For thevt​hv^\{th\}view, we normalize the evidence to obtain belief and uncertainty masses:

bkv=αkv−1Sv,uv=KSvb\_\{k\}^\{v\}=\\frac\{\\alpha\_\{k\}^\{v\}\-1\}\{S^\{v\}\}~,~u^\{v\}=\\frac\{K\}\{S^\{v\}\}\(4\)wherebkvb\_\{k\}^\{v\}is the belief mass of thekt​hk^\{th\}category, anduvu^\{v\}is the uncertainty of the opinion, andK=5K=5is the number of sleep stages\.Sv=∑k=15αkv\{S^\{v\}\}=\\sum\_\{k=1\}^\{5\}\\alpha\_\{k\}^\{v\}is the strength of Dirichlet distribution\. According to subjective logic theory, bothbkv∈𝒃vb\_\{k\}^\{v\}\\in\\boldsymbol\{b\}^\{v\}anduvu^\{v\}must be non\-negative, and their sum should be equal to 1, that is

∑k=15bkv\+uv=1​\(bkv,uv∈\[0,1\]\)\\sum\_\{k=1\}^\{5\}b\_\{k\}^\{v\}\+u^\{v\}=1~~\(~b\_\{k\}^\{v\},~u^\{v\}\\in\[0,1\]~\)\(5\)
Aggregating View\-Specific Opinions\.This section presents a conflict\-aware multi\-view aggregation method \(CMAM\) for synthesizing a reliable joint opinion from different views\{ℳv\}v=14\\\{\\mathcal\{M\}^\{v\}\\\}\_\{v=1\}^\{4\}\. Due to the influence of noise or other factors, opinions formed on different views may diverge\. Therefore, following Principles 2 and 3 proposed in Section[3\.1](https://arxiv.org/html/2605.17021#S3.SS1), we design CMAM to reasonably aggregate consistent or conflicting opinions\. First, we introduce a metric in Definition 1 for quantifying the degree of conflict between two opinions\.

Definition 1 \(Conflict\-degree metric\)\.For two opinionsℳa\\mathcal\{M\}^\{a\}=\(𝒃a,ua\)\(\\boldsymbol\{b\}^\{a\},u^\{a\}\)andℳb\\mathcal\{M\}^\{b\}=\(𝒃b,ub\)\(\\boldsymbol\{b\}^\{b\},u^\{b\}\)over the same instance, the degree of conflict betweenℳa\\mathcal\{M\}^\{a\}andℳb\\mathcal\{M\}^\{b\}is calculated as:

C​\(ℳa,ℳb\)=1−∑kbka⋅bkb∑ibia⋅∑jbjbC\(\\mathcal\{M\}^\{a\},\\mathcal\{M\}^\{b\}\)=1\-\\frac\{\\sum\_\{k\}b\_\{k\}^\{a\}\\cdot b\_\{k\}^\{b\}\}\{\\sum\_\{i\}b\_\{i\}^\{a\}\\cdot\\sum\_\{j\}b\_\{j\}^\{b\}\}\(6\)This metric guarantees two things: \(1\)C=0C=0indicates perfectly consistent opinions\. This situation arises when two absolute opinions supporting the same category are presented\. \(2\)C=1C=1indicates completely conflicting opinions\. Typically,CCranges from 0 to 1, and a largerCCsignifies greater conflict\. Based on this metric, the details of the proposed CMAM are presented as follows\.

Table 1:Performance comparison between ConfSleepNet and state\-of\-the\-art methods for automatic sleep staging\.↑\\uparrowmeans higher is better\. The best results are highlighted in bold\.Definition 2 \(Conflict\-aware multi\-view aggregation\)\.Letℳa=\(𝒃a,ua\)\\mathcal\{M\}^\{a\}=\(\\boldsymbol\{b\}^\{a\},u^\{a\}\)andℳb=\(𝒃b,ub\)\\mathcal\{M\}^\{b\}=\(\\boldsymbol\{b\}^\{b\},u^\{b\}\)be two opinions over the same instance\. The combined opinionℳa​▽¯​b\\mathcal\{M\}^\{a\\bar\{\\triangledown\}b\}is calculated as follows:

ℳa​▽¯​b=ℳa​▽¯​ℳb=\(𝒃a​▽¯​b,ua​▽¯​b\)\\mathcal\{M\}^\{a\\bar\{\\triangledown\}b\}=\\mathcal\{M\}^\{a\}\\bar\{\\triangledown\}\\mathcal\{M\}^\{b\}=\(\\boldsymbol\{b\}^\{a\\bar\{\\triangledown\}b\},u^\{a\\bar\{\\triangledown\}b\}\)\(7\)ua​▽¯​b=C​2​ua​ubua\+ub\+\(1−C\)​ua​ubu^\{a\\bar\{\\triangledown\}b\}=C\\frac\{2u^\{a\}u^\{b\}\}\{u^\{a\}\+u^\{b\}\}\+\(1\-C\)u^\{a\}u^\{b\}\(8\)bka​▽¯​b=ua​bkb\+ub​bka\+\(1−C\)​ua​ub​\(bka\+bkb\)ua\+ubb\_\{k\}^\{a\\bar\{\\triangledown\}b\}=\\frac\{u^\{a\}b\_\{k\}^\{b\}\+u^\{b\}b\_\{k\}^\{a\}\+\(1\-C\)u^\{a\}u^\{b\}\(b\_\{k\}^\{a\}\+b\_\{k\}^\{b\}\)\}\{u^\{a\}\+u^\{b\}\}\(9\)The obtained opinionℳa​▽¯​b\\mathcal\{M\}^\{a\\bar\{\\triangledown\}b\}is a combination ofℳa\\mathcal\{M\}^\{a\}andℳb\\mathcal\{M\}^\{b\}\. Its quality not only depends on the quality of the original opinionsℳa\\mathcal\{M\}^\{a\}andℳb\\mathcal\{M\}^\{b\}, but also is affected by the degree of conflict between them\. In particular, one opinion gains additional confidence when incorporating a consistent opinion, and becomes uncertain when combining a conflicting opinion\. This characteristic distinguishes our proposed CMAM from other multi\-view fusion methods, such as ECML\(Xuet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib108)\)and DS\-based fusion\(Hanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib54)\)\. Following the subjective logic theory\(Jsang,[2018](https://arxiv.org/html/2605.17021#bib.bib53)\), the uncertainty and belief masses of the combined opinionℳa​▽¯​b\\mathcal\{M\}^\{a\\bar\{\\triangledown\}b\}must be non\-negative and sum up to one, i\.e\.,∑kbka​▽¯​b\+ua​▽¯​b=1\\sum\_\{k\}b^\{a\\bar\{\\triangledown\}b\}\_\{k\}\+u^\{a\\bar\{\\triangledown\}b\}=1, and the proof in Appendix[A\.5](https://arxiv.org/html/2605.17021#A1.SS5)\.

Following Definition 2, we can further combine more than two opinions with the following rule:

ℳ=ℳ1​▽¯​ℳ2​▽¯​…​▽¯​ℳn​\(n≥2\)\\mathcal\{M\}=\\mathcal\{M\}^\{1\}\\bar\{\\triangledown\}\\mathcal\{M\}^\{2\}\\bar\{\\triangledown\}~\.\.\.~\\bar\{\\triangledown\}\\mathcal\{M\}^\{n\}~~~\(n\\geq 2\)\(10\)This is actually an extension of combining two opinions discussed in Definition 2\. Based on the above multi\-opinion fusion rule, we can get the final multi\-view joint opinion, and hence get the probability of each category and the overall uncertainty\. Although CMAM improves the reliability of the final prediction, it still has certain limitations, and further details are provided in the Appendix[A\.6](https://arxiv.org/html/2605.17021#A1.SS6)\.

### 3\.5Theoretical Analysis

In this subsection, we theoretically analyze the advantages of CMAM\. Section[3\.1](https://arxiv.org/html/2605.17021#S3.SS1)presents two principles,P​r​i​n​c​i​p​l​e​2Principle\\ 2and33, to guide reasonable multi\-view aggregation\. The following two propositions provide theoretical analysis to support these principles\.

Proposition 1\.Given an opinionℳo=\(𝒃o,uo\)\\mathcal\{M\}^\{o\}=\(\\boldsymbol\{b\}^\{o\},u^\{o\}\), integrating additional consistent opinionℳa=\(𝒃a,ua\)\\mathcal\{M\}^\{a\}=\(\\boldsymbol\{b\}^\{a\},u^\{a\}\)intoℳo\\mathcal\{M\}^\{o\}would produce a new opinion with lower uncertainty thanuou^\{o\}\.

Proof\.

Letℳo​▽¯​a=\(𝒃o​▽¯​a,uo​▽¯​a\)\\mathcal\{M\}^\{o\\bar\{\\triangledown\}a\}=\(\\boldsymbol\{b\}^\{o\\bar\{\\triangledown\}a\},u^\{o\\bar\{\\triangledown\}a\}\)be the combination ofℳo\\mathcal\{M\}^\{o\}andℳa\\mathcal\{M\}^\{a\}\. The degree of conflict betweenℳo\\mathcal\{M\}^\{o\}andℳa\\mathcal\{M\}^\{a\}, denoted byCC, is approaching 0 since they are consistent\. Such that

limC→0uo​▽¯​a\\displaystyle\\lim\\limits\_\{C\\to 0\}u^\{o\\bar\{\\triangledown\}a\}=limC→0\(1−C\)​uo​ua\+C​2​uo​uauo\+ua\\displaystyle=\\lim\\limits\_\{C\\to 0\}\(1\-C\)u^\{o\}u^\{a\}\+C\\frac\{2u^\{o\}u^\{a\}\}\{u^\{o\}\+u^\{a\}\}=uo​ua<uo\\displaystyle=~u^\{o\}u^\{a\}<u^\{o\}
Proposition 1 demonstrates that CMAM can enhance prediction confidence when incorporating consistent opinions\.

Proposition 2\.Given an opinionℳo=\(𝒃o,uo\)\\mathcal\{M\}^\{o\}=\(\\boldsymbol\{b\}^\{o\},u^\{o\}\), integrating a conflictive opinionℳb\\mathcal\{M\}^\{b\}with higher uncertainty\(i\.e\.,ub\>uou^\{b\}\>u^\{o\}\)intoℳo\\mathcal\{M\}^\{o\}would increase the uncertainty\.

Proof\.

Letℳo​▽¯​b=\(𝒃o​▽¯​b,uo​▽¯​b\)\\mathcal\{M\}^\{o\\bar\{\\triangledown\}b\}=\(\\boldsymbol\{b\}^\{o\\bar\{\\triangledown\}b\},u^\{o\\bar\{\\triangledown\}b\}\)be the combination ofℳo\\mathcal\{M\}^\{o\}andℳb\\mathcal\{M\}^\{b\}\. Sinceℳo\\mathcal\{M\}^\{o\}andℳb\\mathcal\{M\}^\{b\}are in conflict, the degree of conflict between them,CC, is close to 1\. Such that

limC→1uo​▽¯​b\\displaystyle\\lim\\limits\_\{C\\to 1\}u^\{o\\bar\{\\triangledown\}b\}=limC→1\(1−C\)​uo​ub\+C​2​uo​ubuo\+ub\\displaystyle=\\lim\\limits\_\{C\\to 1\}\(1\-C\)u^\{o\}u^\{b\}\+C\\frac\{2u^\{o\}u^\{b\}\}\{u^\{o\}\+u^\{b\}\}=21\+uoub​uo\>uo\\displaystyle=~\\frac\{2\}\{1\+\\frac\{u^\{o\}\}\{u^\{b\}\}\}u^\{o\}\>u^\{o\}
Proposition 2 demonstrates that CMAM can increase prediction uncertainty when incorporating conflicting opinions\.

Notably, the proofs only address the extreme cases \(i\.e\.,CCapproaches11and0\)\. This is because we intend to use Propositions 1 and 2 to demonstrate that our fusion operation satisfies the two proposed design principles, and the extreme cases best illustrate these properties\. For the general case \(i\.e\.,C∈\(0,1\)C\\in\(0,1\)\), the conclusions of the proofs can still be extended through continuity or monotonicity arguments\.

### 3\.6Multi\-View Joint Training

The evidential DNNs\{fv​\(⋅\)\}v=14\\\{f\_\{v\}\(\\cdot\)\\\}\_\{v=1\}^\{4\}in ConfSleepNet are trained jointly to learn view\-specific evidences\{ev\}v=14\\\{\\textbf\{e\}^\{v\}\\\}\_\{v=1\}^\{4\}from multiple views\. In conventional neural network\-based classifiers, the cross\-entropy loss is commonly employed\. However, we need to adapt the cross\-entropy loss to account for the evidence\-based networks\. In order to extract as much evidence for the ground\-truth category as possible, the loss function for the evidential DNNfv​\(⋅\)f\_\{v\}\(\\cdot\)can be defined as:

ℒa​c​cv​\(ev\)\\displaystyle\\mathcal\{L\}^\{v\}\_\{acc\}\(\\textbf\{e\}^\{v\}\)=∫\[∑i=1K−yi​log⁡\(piv\)\]​1B​\(ev\+1\)​∏j=1K\(pjv\)ejv​d​𝒑v\\displaystyle=\\int\\left\[\\sum\_\{i=1\}^\{K\}\-y\_\{i\}\\log\(p^\{v\}\_\{i\}\)\\right\]\\frac\{1\}\{B\(\\textbf\{e\}^\{v\}\+\\textbf\{1\}\)\}\\prod\_\{j=1\}^\{K\}\(p^\{v\}\_\{j\}\)^\{e^\{v\}\_\{j\}\}\\,d\\boldsymbol\{p\}^\{v\}\(11\)=∑i=1Kyi​\[ψ​\(Sv\)−ψ​\(eiv\+1\)\]\\displaystyle=\\sum\_\{i=1\}^\{K\}y\_\{i\}\\left\[\\psi\(S^\{v\}\)\-\\psi\(e^\{v\}\_\{i\}\+1\)\\right\]whereψ​\(⋅\)\\psi\(\\cdot\)is the digamma function, andyiy\_\{i\}is a one\-hot encoder encoding the ground\-truth sleep stage of the current 30\-s epochxix\_\{i\}\. Note that sleep stagesN​1N1,N​2N2andN​3N3are treated as a singleN​R​E​MNREMstage in DNNf3f\_\{3\}\(⋅\\cdot\) andf4f\_\{4\}\(⋅\\cdot\), and thus the evidences forN​1N1,N​2N2andN​3N3are extracted together\.

The loss functionℒa​c​c​\(⋅\)\\mathcal\{L\}\_\{acc\}\(\\cdot\)encourages the evidence for the ground\-truth category to approach the total evidence, thereby maximizing evidence for the correct category\. However, this does not guarantee suppression of evidence from incorrect classes\. Such misleading evidence for a sample may not be a problem as long as it is correctly classified by the network \(i\.e\., the evidence for the correct sleep stage is stronger than the evidence for other class labels\)\. However, we prefer the total evidence to shrink to zero for a sleep epoch if it cannot be correctly classified, thereby showing a high uncertainty\. We achieve this by incorporating an additional term into our loss function, namely Kullback\-Leibler \(KL\) divergence:

ℒK​Lv​\(ev\)=\\displaystyle\\mathcal\{L\}^\{v\}\_\{KL\}\(\\textbf\{e\}^\{v\}\)=KL\[D\(𝒑v\|e~v\)\|\|D\(𝒑v\|𝟏\)\]\\displaystyle~KL\[D\(\\boldsymbol\{p\}^\{v\}\|\{\\tilde\{\\textbf\{e\}\}\}^\{v\}\)~\|\|~D\(\\boldsymbol\{p\}^\{v\}\|\\boldsymbol\{1\}\)\]\(12\)=\\displaystyle=log⁡\(Γ​\(∑k=1K\(e~kv\+1\)\)Γ​\(K\)​∏k=1KΓ​\(e~kv\+1\)\)\+\\displaystyle~\\log\\left\(\\frac\{\\Gamma\(\\sum\_\{k=1\}^\{K\}\(\\tilde\{e\}^\{v\}\_\{k\}\+1\)\)\}\{\\Gamma\(K\)\\prod\_\{k=1\}^\{K\}\\Gamma\(\\tilde\{e\}^\{v\}\_\{k\}\+1\)\}\\right\)\+∑k=1Ke~kv​\[ψ​\(e~kv\+1\)−ψ​\(∑j=1K\(e~jv\+1\)\)\]\\displaystyle~\\sum\_\{k=1\}^\{K\}\\tilde\{e\}^\{v\}\_\{k\}\\left\[\\psi\(\\tilde\{e\}^\{v\}\_\{k\}\+1\)\-\\psi\(\\sum\_\{j=1\}^\{K\}\(\\tilde\{e\}^\{v\}\_\{j\}\+1\)\)\\right\]whereΓ​\(⋅\)\\Gamma\(\\cdot\)is the gamma function andψ​\(⋅\)\\psi\(\\cdot\)is the digamma function, andD​\(𝒑v\|𝟏\)D\(\\boldsymbol\{p\}^\{v\}\|\\boldsymbol\{1\}\)is the uniform Dirichlet distribution, ande~v=\(𝟏−𝒚\)⊙ev\{\\tilde\{\\textbf\{e\}\}\}^\{v\}=\(\\boldsymbol\{1\}\-\\boldsymbol\{y\}\)\\odot\\textbf\{e\}^\{v\}is the Dirichlet parameters after removal of non\-misleading evidences\. The loss is defined as:

ℒ​\(ev\)=∑v=14ℒa​c​cv​\(ev\)\+λt​ℒK​Lv​\(ev\)\\mathcal\{L\}\(\\textbf\{e\}^\{v\}\)=\\sum\_\{v=1\}^\{4\}\\mathcal\{L\}^\{v\}\_\{acc\}\(\\textbf\{e\}^\{v\}\)\+\\lambda\_\{t\}\\mathcal\{L\}^\{v\}\_\{KL\}\(\\textbf\{e\}^\{v\}\)\(13\)whereλt=min⁡\(1\.0,t/10\)∈\[0,1\]\\lambda\_\{t\}=\\min\(1\.0,t/10\)\\in\[0,1\]is the annealing coefficient,ttis the index of the current training epoch\. By gradually increasing the weight of KL divergence in the loss through the annealing coefficient, we allow the neural network to explore the parameter space and avoid premature convergence to the uniform distribution for the misclassified samples, which may be correctly classified in future epochs\.

## 4Experiments

### 4\.1Experimental Setups

Sleep Datasets\.We conducted extensive experiments on four publicly available datasets, including SleepEDF\-20, SleepEDF\-78, the Montreal Archive of Sleep Studies \(MASS\), and the Sleep Heart Health Study \(SHHS\)\. Detailed descriptions of each dataset and the channels used in the experiments are provided in the Appendix[A\.7](https://arxiv.org/html/2605.17021#A1.SS7)\.

![Refer to caption](https://arxiv.org/html/2605.17021v1/x2.png)Figure 2:Intermediate prediction results produced by view\-specific DNNs\{fv​\(⋅\)\}v=14\\\{f\_\{v\}\(\\cdot\)\\\}\_\{v=1\}^\{4\}and the average degree of conflict between different views\. \(a\) The results produced by DNNf1​\(⋅\)f\_\{1\}\(\\cdot\)andf2​\(⋅\)f\_\{2\}\(\\cdot\)have a 5\-class structure, while \(b\) the results produced byf3​\(⋅\)f\_\{3\}\(\\cdot\)andf4​\(⋅\)f\_\{4\}\(\\cdot\)have a 3\-class structure\. \(c\) The conflict degree is typically at a low level over a sequence of epochs of the same sleep stage, and increases over transition epochs\. \(d\) We reduce the impacts of conflictive views in making final results, which are obviously more precise than the view\-specific results\.Compared Methods\.We compared ConfSleepNet with several representative baselines on multiple public datasets\. These baselines include \(1\) single\-view networks including DeepSleepNet\(Suprataket al\.,[2017](https://arxiv.org/html/2605.17021#bib.bib73)\), TinySleepNet\(Supratak and Guo,[2020](https://arxiv.org/html/2605.17021#bib.bib76)\), SleePyCo\(Leeet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib101)\), and FlexibleSleepNet\(Renet al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib186)\), \(2\) multi\-view networks including SalientSleepNet\(Jiaet al\.,[2021](https://arxiv.org/html/2605.17021#bib.bib85)\), XSleepNet\(Phanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib61)\), and HMDT\-Net\(Wanget al\.,[2026](https://arxiv.org/html/2605.17021#bib.bib187)\), and \(3\) uncertainty\-aware networks including TMCEK\(Lianget al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib11)\)\. Detailed descriptions of all baselines are provided in Appendix[A\.8](https://arxiv.org/html/2605.17021#A1.SS8)\.

Evaluation Metrics and Implementation Details\.To evaluate the performance of the proposed ConfSleepNet, we used accuracy \(Acc\) and macro\-averaged F1\-score \(MF1\)\. The performance on each sleep stage was evaluated using the per\-class F1\-score\. Details of the implementation and evaluation metrics are provided in the Appendix[A\.9](https://arxiv.org/html/2605.17021#A1.SS9)\.

### 4\.2Experimental Results

Performance Comparison\.Table[1](https://arxiv.org/html/2605.17021#S3.T1)presents a comprehensive performance comparison between the proposed ConfSleepNet and other competing baselines\. The results demonstrate that ConfSleepNet achieves the best performance among all compared methods, validating the feasibility and superiority of the proposed model\. Notably, compared with representative works DeepSleepNet and XSleepNet, ConfSleepNet achieves significant improvements in classification accuracy, with gains of 2\.7%\-5\.2% and 0\.6%\-2\.0%, respectively\. We argue that even a 1% improvement in accuracy corresponds to dozens of correctly classified samples\. Therefore, ConfSleepNet has practical value in the diagnosis of sleep disorders\. Furthermore, in practice, theR​E​MREMandWWstages are prone to confusion in classification due to their inherent feature similarities\. Nevertheless, ConfSleepNet further improves the class\-specific F1 scores for both stages\. Particularly for theR​E​MREMstage, which holds greater clinical relevance, ConfSleepNet achieves the highest F1 scores across all datasets\. Thus, we anticipate that ConfSleepNet has clinical significance for the assessment and diagnosis of sleep disorders\.

![Refer to caption](https://arxiv.org/html/2605.17021v1/x3.png)\(a\)Case I
![Refer to caption](https://arxiv.org/html/2605.17021v1/x4.png)\(b\)Case II
![Refer to caption](https://arxiv.org/html/2605.17021v1/x5.png)\(c\)Case III
![Refer to caption](https://arxiv.org/html/2605.17021v1/x6.png)\(d\)Case IV

Figure 3:Confusion matrices on the MASS\-SS3 dataset\. Rows represent the true class, and columns represent the predicted class\. Darker color indicates a larger number of correctly classified samples\.Table 2:Performance comparison between different variants of ConfSleepNet\.We also developed a baseline model \(denoted as ConfSleepNet\-\) based on an average\-based multi\-view fusion strategy and compared it with ConfSleepNet and other competing baselines\. The results show that ConfSleepNet\- achieves competitive performance compared to existing works, validating the effectiveness of the proposed evidential DNNs\. Moreover, ConfSleepNet outperforms ConfSleepNet\- by 0\.8%\-1\.5% in accuracy\. This performance gap indicates that the proposed conflict\-aware multi\-view aggregation method enhances model robustness against misleading predictions, thereby improving overall performance\. Particularly, we compared ConfSleepNet with TMCEK\(Lianget al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib11)\), the current state\-of\-the\-art uncertainty\-aware baseline\. The results show that ConfSleepNet achieves accuracy gains of 2\.2%\-4\.9% over TMCEK, further validating the superiority of CMAM\.

Table 3:Performance comparison of multi\-view benchmarks\.Case Study\.We conducted a case study on a single patient from the MASS\-SS3 dataset to investigate the conflict\-aware multi\-view aggregation process proposed in ConfSleepNet\. Fig\.[2](https://arxiv.org/html/2605.17021#S4.F2)\(a\) and \(b\) illustrate the overnight sleep staging results generated by the four evidential DNN\{fv​\(⋅\)\}v=14\\\{f\_\{v\}\(\\cdot\)\\\}\_\{v=1\}^\{4\}on their respective views, along with the corresponding ground truth hypnograms\. Specifically,f1​\(⋅\)f\_\{1\}\(\\cdot\)andf2​\(⋅\)f\_\{2\}\(\\cdot\)perform 5\-class sleep staging as shown in Fig\.[2](https://arxiv.org/html/2605.17021#S4.F2)\(a\), whilef3​\(⋅\)f\_\{3\}\(\\cdot\)andf4​\(⋅\)f\_\{4\}\(\\cdot\)perform 3\-class sleep staging as shown in Fig\.[2](https://arxiv.org/html/2605.17021#S4.F2)\(b\)\. It can be intuitively observed that due to differences in the number of input views, classification granularity, and signal characteristics, the multiple opinions generated by\{fv​\(⋅\)\}v=14\\\{f\_\{v\}\(\\cdot\)\\\}\_\{v=1\}^\{4\}are not always consistent\.

We quantified the degree of conflict between each pair of views using the conflict metric introduced in Section[3\.4](https://arxiv.org/html/2605.17021#S3.SS4)\. The average conflict degree among views is presented in Fig\.[2](https://arxiv.org/html/2605.17021#S4.F2)\(c\)\. It can be seen that during continuousWWorR​E​MREMstages, the conflict level is low \(even dropping to zero\), indicating a high degree of consistency among views\. This observation is supported by the high F1\-scores achieved for these two sleep stages, as reported in Table[1](https://arxiv.org/html/2605.17021#S3.T1)\. In contrast, inter\-view prediction conflicts are particularly evident during sleep stage transitions \(e\.g\., from N2 to N3\)\. The main reason is that transition epochs often contain features of multiple sleep stages simultaneously\. We further discuss this situation in the Appendix[A\.10](https://arxiv.org/html/2605.17021#A1.SS10)\.

Fig\.[2](https://arxiv.org/html/2605.17021#S4.F2)\(d\) presents the final prediction results after view aggregation\. It can be observed that, compared to individual view\-specific predictions, the aggregated result more closely aligns with the ground truth hypnogram\. This is because the proposed conflict\-aware multi\-view aggregation method can accurately identify misleading views through uncertainty estimation and conflict assessment, thereby reducing their influence in the final decision\-making process\.

### 4\.3Ablation Study

To further evaluate the effectiveness of each component, we conducted ablation experiments on the MASS\-SS3 dataset\. Specifically, the following four model variants were constructed: \(1\) Case I \(ConfSleepNet\-\): replacing the proposed CMAM with an average\-based multi\-view fusion strategy; \(2\) Case II: use the multi\-view fusion method proposed byXuet al\.\([2024](https://arxiv.org/html/2605.17021#bib.bib108)\)to replace CMAM; \(3\) Case III: removing the evidence mapping layer andf3​\(⋅\)f\_\{3\}\(\\cdot\)from the evidence DNN, and performing five\-class evidence learning at a single granularity for all views; \(4\) Case IV \(ConfSleepNet\): the complete version of the proposed method\.

The experimental results are presented in Table[2](https://arxiv.org/html/2605.17021#S4.T2), where Case IV achieves the best performance\. Compared with Case I, the proposed CMAM significantly improves overall performance and reduces the impact of misleading views on the final predictions\. The effectiveness of CMAM is further validated by comparing Case II and Case IV\. In comparison with Case III, the hybrid category structure provides unique and complementary information for distinguishing sleep stages\. Furthermore, Fig\.[3](https://arxiv.org/html/2605.17021#S4.F3)shows the confusion matrices of the four cases\. Compared with the other ablation variants, ConfSleepNet exhibits a relative increase in correctly classified samples and a relative decrease in misclassified samples\.

### 4\.4Multi\-View Benchmark

To further validate the effectiveness of the proposed CMAM, we conducted experiments on four multi\-view datasets: HandWritten \(HW\), Scene15, CUB, and PIE\. The baseline methods compared include EDL\(Sensoyet al\.,[2018](https://arxiv.org/html/2605.17021#bib.bib58)\), DCCAE\(Wanget al\.,[2015](https://arxiv.org/html/2605.17021#bib.bib188)\), ETMC\(Hanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib54)\), RCML\(Xuet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib108)\), CCML\(Liuet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib189)\), and TMCEK\(Lianget al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib11)\)\(See Appendix[A\.11](https://arxiv.org/html/2605.17021#A1.SS11)for details\)\. As shown in Table[3](https://arxiv.org/html/2605.17021#S4.T3), CMAM achieves the best accuracy across multiple datasets\. This strongly demonstrates that CMAM enhances model performance by quantifying the degree of conflict among views and reasonably handling the influence of such conflicts on final decision\-making\.

## 5Conclusion

This work proposes ConfSleepNet, a sleep staging method based on evidential deep learning\. The model utilizes parallel evidence networks to extract hybrid\-granularity evidence from multi\-view inputs, aiming to capture both unique and complementary information from multi\-modal signals\. Furthermore, this paper introduces a novel conflict\-aware view aggregation strategy to fuse view\-specific opinions\. This method dynamically detects conflicts among views through uncertainty estimation, thereby enhancing the robustness of the final predictions\. Both theoretical analysis and experimental results validate the effectiveness of ConfSleepNet in sleep stage classification\.

## Acknowledgements

This work was supported by Grants 2025JC\-YBQN\-841 \(Natural Science Foundation of Shaanxi Province\), 25YJCZH244 and 24XJCZH024 \(Ministry of Education of China\)\.

## Impact Statement

Theoretical and experimental results demonstrate that the proposed method enhances both classification performance and interpretability in sleep staging tasks, which holds positive implications for promoting its practical clinical application\. Experiments on multi\-view tasks further validate the generalizability of the method, and future work will investigate its value in scenarios such as medical imaging\.

## References

- R\. B\. Berry, R\. Brooks, C\. E\. Gamaldo, S\. M\. Harding, C\. Marcus, B\. V\. Vaughn,et al\.\(2012\)The AASM manual for the scoring of sleep and associated events\.Rules, Terminology and Technical Specifications, Darien, Illinois, American Academy of Sleep Medicine176,pp\. 1–7\.Cited by:[§A\.2](https://arxiv.org/html/2605.17021#A1.SS2.p3.1),[§3](https://arxiv.org/html/2605.17021#S3.p1.5)\.
- Y\. Cao, J\. Xu, S\. Lin, F\. Wei, and H\. Hu \(2019\)GCNet: non\-local networks meet squeeze\-excitation networks and beyond\.InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops,Cited by:[§A\.2](https://arxiv.org/html/2605.17021#A1.SS2.p1.3),[§3\.2](https://arxiv.org/html/2605.17021#S3.SS2.p1.8)\.
- S\. Chambon, M\. N\. Galtier, P\. J\. Arnal, G\. Wainrib, and A\. Gramfort \(2018\)A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series\.IEEE Transactions on Neural Systems and Rehabilitation Engineering26\(4\),pp\. 758–769\.Cited by:[§2\.1](https://arxiv.org/html/2605.17021#S2.SS1.p1.1)\.
- X\. Chen, Y\. Zhang, Q\. Chen, L\. Zhou, H\. Chen, H\. Wu, Y\. Xu, K\. Chen, B\. Yin, W\. Chen,et al\.\(2025\)Astgsleep: attention based spatial\-temporal graph network for sleep staging\.IEEE Transactions on Instrumentation and Measurement\.Cited by:[§A\.10](https://arxiv.org/html/2605.17021#A1.SS10.p2.1)\.
- Z\. Chen, Z\. Yang, L\. Zhu, W\. Chen, T\. Tamura, N\. Ono, M\. Altaf\-Ul\-Amin, S\. Kanaya, and M\. Huang \(2023\)Automated sleep staging via parallel frequency\-cut attention\.IEEE Transactions on Neural Systems and Rehabilitation Engineering31,pp\. 1974–1985\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p2.1)\.
- Y\. Dai, X\. Li, S\. Liang, L\. Wang, Q\. Duan, H\. Yang, C\. Zhang, X\. Chen, L\. Li, X\. Li,et al\.\(2023\)Multichannelsleepnet: a transformer\-based model for automatic sleep stage classification with psg\.IEEE Journal of Biomedical and Health Informatics27\(9\),pp\. 4204–4215\.Cited by:[§2\.1](https://arxiv.org/html/2605.17021#S2.SS1.p1.1)\.
- A\. P\. Dempster \(1968\)A generalization of bayesian inference\.Journal of the Royal Statistical Society: Series B \(Methodological\)30\(2\),pp\. 205–232\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- R\. Egele, R\. Maulik, K\. Raghavan, B\. Lusch, I\. Guyon, and P\. Balaprakash \(2022\)Autodeuq: automated deep ensemble with uncertainty quantification\.InInternational Conference on Pattern Recognition,pp\. 1908–1914\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- P\. Fonseca, N\. Den Teuling, X\. Long, and R\. M\. Aarts \(2016\)Cardiorespiratory sleep stage detection using conditional random fields\.IEEE Journal of Biomedical and Health Informatics21\(4\),pp\. 956–966\.Cited by:[§A\.7](https://arxiv.org/html/2605.17021#A1.SS7.p5.1)\.
- Y\. Gal and Z\. Ghahramani \(2016\)Dropout as a bayesian approximation: representing model uncertainty in deep learning\.InInternational Conference on Machine Learning,pp\. 1050–1059\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- M\. A\. Ganaie, M\. Hu, A\. K\. Malik, M\. Tanveer, and P\. N\. Suganthan \(2022\)Ensemble deep learning: a review\.Engineering Applications of Artificial Intelligence115,pp\. 105151\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- A\. L\. Goldberger, L\. A\. Amaral, L\. Glass, J\. M\. Hausdorff, P\. C\. Ivanov, R\. G\. Mark, J\. E\. Mietus, G\. B\. Moody, C\. Peng, and H\. E\. Stanley \(2000\)PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals\.Circulation101\(23\),pp\. e215–e220\.Cited by:[§A\.7](https://arxiv.org/html/2605.17021#A1.SS7.p3.1)\.
- M\. A\. Grandner \(2022\)Sleep, health, and society\.Sleep Medicine Clinics17\(2\),pp\. 117–139\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p1.1)\.
- Z\. Han, C\. Zhang, H\. Fu, and J\. T\. Zhou \(2022\)Trusted multi\-view classification with dynamic evidential fusion\.IEEE Transactions on Pattern Analysis and Machine Intelligence45\(2\),pp\. 2551–2566\.Cited by:[3rd item](https://arxiv.org/html/2605.17021#A1.I4.i3.p1.1),[§A\.11](https://arxiv.org/html/2605.17021#A1.SS11.p1.2),[§A\.6](https://arxiv.org/html/2605.17021#A1.SS6.p1.4),[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1),[§3\.4](https://arxiv.org/html/2605.17021#S3.SS4.p5.10),[§4\.4](https://arxiv.org/html/2605.17021#S4.SS4.p1.1)\.
- L\. Huang, S\. Ruan, P\. Decazes, and T\. Denœux \(2025\)Deep evidential fusion with uncertainty quantification and reliability learning for multimodal medical image segmentation\.Information Fusion113,pp\. 102648\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- Z\. Jia, Y\. Lin, J\. Wang, X\. Wang, P\. Xie, and Y\. Zhang \(2021\)SalientSleepNet: multimodal salient wave detection network for sleep staging\.arXiv preprint: 2105\.13864\.Cited by:[3rd item](https://arxiv.org/html/2605.17021#A1.I2.i3.p1.1),[§2\.1](https://arxiv.org/html/2605.17021#S2.SS1.p1.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.11.4.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.21.14.1),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1)\.
- A\. Jsang \(2018\)Subjective Logic: A formalism for reasoning under uncertainty\.InSpringer Publishing Company,Cited by:[§3\.2](https://arxiv.org/html/2605.17021#S3.SS2.p2.3),[§3\.4](https://arxiv.org/html/2605.17021#S3.SS4.p5.10)\.
- B\. Kemp, A\. H\. Zwinderman, B\. Tuk, H\. A\. Kamphuisen, and J\. J\. Oberye \(2000\)Analysis of a sleep\-dependent neuronal feedback loop: the slow\-wave microcontinuity of the EEG\.IEEE Transactions on Biomedical Engineering47\(9\),pp\. 1185–1194\.Cited by:[§A\.7](https://arxiv.org/html/2605.17021#A1.SS7.p2.3)\.
- D\. P\. Kingma and J\. Ba \(2014\)Adam: a method for stochastic optimization\.arXiv preprint arXiv:1412\.6980\.Cited by:[§A\.9](https://arxiv.org/html/2605.17021#A1.SS9.p1.1)\.
- G\. Kong, C\. Li, H\. Peng, Z\. Han, and H\. Qiao \(2023\)EEG\-based sleep stage classification via neural architecture search\.IEEE Transactions on Neural Systems and Rehabilitation Engineering31,pp\. 1075–1085\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p1.1)\.
- J\. M\. Lane, J\. Qian, E\. Mignot, S\. Redline, F\. A\. Scheer, and R\. Saxena \(2023\)Genetics of circadian rhythms and sleep in human health and disease\.Nature Reviews Genetics24\(1\),pp\. 4–20\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p1.1)\.
- S\. Lee, Y\. Yu, S\. Back, H\. Seo, and K\. Lee \(2024\)Sleepyco: automatic sleep scoring with feature pyramid and contrastive learning\.Expert Systems with Applications240,pp\. 122551\.Cited by:[5th item](https://arxiv.org/html/2605.17021#A1.I2.i5.p1.2),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.13.6.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.23.16.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.32.25.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.38.31.1),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1)\.
- S\. Li, X\. Xu, Y\. Yang, F\. Shen, Y\. Mo, Y\. Li, and H\. T\. Shen \(2023\)Dcel: deep cross\-modal evidential learning for text\-based person retrieval\.InProceedings of the 31st ACM International Conference on Multimedia,pp\. 6292–6300\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- Y\. Li, K\. Song, Y\. Zhang, and F\. Karray \(2024\)Method and system for automated detection of sleep spindles using a single eeg channels based teo and emd\.Expert Systems with Applications249,pp\. 123661\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p2.1)\.
- X\. Liang, S\. Wang, Y\. Qian, Q\. Guo, L\. Du, B\. Jiang, T\. Luo, and F\. Li \(2025\)Trusted multi\-view classification with expert knowledge constraints\.InInternational Conference on Machine Learning,Cited by:[7th item](https://arxiv.org/html/2605.17021#A1.I2.i7.p1.1),[6th item](https://arxiv.org/html/2605.17021#A1.I4.i6.p1.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.15.8.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.25.18.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.33.26.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.40.33.1),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1),[§4\.2](https://arxiv.org/html/2605.17021#S4.SS2.p2.1),[§4\.4](https://arxiv.org/html/2605.17021#S4.SS4.p1.1)\.
- W\. Liu, X\. Yue, Y\. Chen, and T\. Denoeux \(2022\)Trusted multi\-view deep learning with opinion aggregation\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.36,pp\. 7585–7593\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- Y\. Liu, L\. Liu, C\. Xu, X\. Song, Z\. Guan, and W\. Zhao \(2024\)Dynamic evidence decoupling for trusted multi\-view learning\.InProceedings of the 32nd ACM International Conference on Multimedia,pp\. 7269–7277\.Cited by:[5th item](https://arxiv.org/html/2605.17021#A1.I4.i5.p1.1),[§4\.4](https://arxiv.org/html/2605.17021#S4.SS4.p1.1)\.
- R\. M\. Neal \(2012\)Bayesian learning for neural networks\.InSpringer Science & Business Media,Vol\.118\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- I\. Perez\-Pozuelo, B\. Zhai, J\. Palotti, R\. Mall, M\. Aupetit, J\. M\. Garcia\-Gomez, S\. Taheri, Y\. Guan, and L\. Fernandez\-Luque \(2020\)The future of sleep health: a data\-driven revolution in sleep science and medicine\.NPJ Digital Medicine3\(1\),pp\. 42\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p1.1)\.
- H\. Phan, F\. Andreotti, N\. Cooray, O\. Y\. Chén, and M\. De Vos \(2018\)Joint classification and prediction cnn framework for automatic sleep stage classification\.IEEE Transactions on Biomedical Engineering66\(5\),pp\. 1285–1296\.Cited by:[§2\.1](https://arxiv.org/html/2605.17021#S2.SS1.p1.1)\.
- H\. Phan, F\. Andreotti, N\. Cooray, O\. Y\. Chén, and M\. De Vos \(2019\)SeqSleepNet: end\-to\-end hierarchical recurrent neural network for sequence\-to\-sequence automatic sleep staging\.IEEE Transactions on Neural Systems and Rehabilitation Engineering27\(3\),pp\. 400–410\.Cited by:[§A\.7](https://arxiv.org/html/2605.17021#A1.SS7.p2.3)\.
- H\. Phan, O\. Y\. Chén, M\. C\. Tran, P\. Koch, A\. Mertins, and M\. De Vos \(2022\)XSleepNet: Multi\-view sequential model for automatic sleep staging\.IEEE Transactions on Pattern Analysis and Machine Intelligence44\(9\),pp\. 5903–5915\.Cited by:[4th item](https://arxiv.org/html/2605.17021#A1.I2.i4.p1.1),[§A\.7](https://arxiv.org/html/2605.17021#A1.SS7.p2.3),[§1](https://arxiv.org/html/2605.17021#S1.p2.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.12.5.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.22.15.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.31.24.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.37.30.1),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1)\.
- J\. Phyo, W\. Ko, E\. Jeon, and H\. Suk \(2023\)TransSleep: Transitioning\-aware attention\-based deep neural network for sleep staging\.IEEE Transactions on Cybernetics53\(7\),pp\. 4500–4510\.Cited by:[§A\.7](https://arxiv.org/html/2605.17021#A1.SS7.p2.3),[§1](https://arxiv.org/html/2605.17021#S1.p2.1)\.
- F\. Portier, A\. Portmann, P\. Czernichow, L\. Vascaut, E\. Devin, D\. Benhamou, A\. Cuvelier, and J\. F\. Muir \(2000\)Evaluation of home versus laboratory polysomnography in the diagnosis of sleep apnea syndrome\.American Journal of Respiratory and Critical Care Medicine162\(3\),pp\. 814–818\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p1.1)\.
- J\. Pradeepkumar, M\. Anandakumar, V\. Kugathasan, D\. Suntharalingham, S\. L\. Kappel, A\. C\. De Silva, and C\. U\. Edussooriya \(2024\)Towards interpretable sleep stage classification using cross\-modal transformers\.IEEE Transactions on Neural Systems and Rehabilitation Engineering\.Cited by:[§1](https://arxiv.org/html/2605.17021#S1.p2.1)\.
- X\. Qin, Z\. Zhang, C\. Huang, M\. Dehghan, O\. R\. Zaiane, and M\. Jagersand \(2020\)U2\-net: going deeper with nested u\-structure for salient object detection\.Pattern Recognition106,pp\. 107404\.Cited by:[3rd item](https://arxiv.org/html/2605.17021#A1.I2.i3.p1.1)\.
- S\. F\. Quan, B\. V\. Howard, C\. Iber, J\. P\. Kiley, F\. J\. Nieto, G\. T\. O’Connor, D\. M\. Rapoport, S\. Redline, J\. Robbins, J\. M\. Samet,et al\.\(1997\)The sleep heart health study: design, rationale, and methods\.Sleep20\(12\),pp\. 1077–1085\.Cited by:[§A\.7](https://arxiv.org/html/2605.17021#A1.SS7.p5.1)\.
- Z\. Ren, J\. Ma, and Y\. Ding \(2025\)FlexibleSleepNet: a model for automatic sleep stage classification based on multi\-channel polysomnography\.IEEE Journal of Biomedical and Health Informatics29\(5\),pp\. 3488–3501\.Cited by:[6th item](https://arxiv.org/html/2605.17021#A1.I2.i6.p1.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.14.7.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.24.17.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.39.32.1),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1)\.
- M\. Sensoy, L\. Kaplan, and M\. Kandemir \(2018\)Evidential deep learning to quantify classification uncertainty\.InAdvances in Neural Information Processing Systems,Vol\.31,pp\. 3179–3189\.Cited by:[1st item](https://arxiv.org/html/2605.17021#A1.I4.i1.p1.1),[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1),[§4\.4](https://arxiv.org/html/2605.17021#S4.SS4.p1.1)\.
- Z\. Shao, W\. Dou, and Y\. Pan \(2024\)Dual\-level deep evidential fusion: integrating multimodal information for enhanced reliable decision\-making in deep learning\.Information Fusion103,pp\. 102113\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- A\. Supratak, H\. Dong, C\. Wu, and Y\. Guo \(2017\)DeepSleepNet: a model for automatic sleep stage scoring based on raw single\-channel EEG\.IEEE Transactions on Neural Systems and Rehabilitation Engineering25\(11\),pp\. 1998–2008\.Cited by:[1st item](https://arxiv.org/html/2605.17021#A1.I2.i1.p1.1),[§1](https://arxiv.org/html/2605.17021#S1.p2.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.19.12.2),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.29.22.2),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.36.29.2),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.9.2.2),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1)\.
- A\. Supratak and Y\. Guo \(2020\)TinySleepNet: An efficient deep learning model for sleep stage scoring based on raw single\-channel EEG\.InAnnual International Conference of the IEEE Engineering in Medicine & Biology Society,pp\. 641–644\.Cited by:[2nd item](https://arxiv.org/html/2605.17021#A1.I2.i2.p1.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.10.3.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.20.13.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.30.23.1),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1)\.
- A\. Vaswani \(2017\)Attention is all you need\.InAdvances in Neural Information Processing Systems,Cited by:[§2\.1](https://arxiv.org/html/2605.17021#S2.SS1.p1.1)\.
- K\. Wang, Q\. Zhu, J\. Zhao, C\. Zheng, W\. Shao, and D\. Zhang \(2026\)Heterogeneous modality dynamic trustworthy fusion network for cross\-subject sleep stage classification\.IEEE Transactions on Emerging Topics in Computational Intelligence\.Cited by:[8th item](https://arxiv.org/html/2605.17021#A1.I2.i8.p1.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.16.9.1),[Table 1](https://arxiv.org/html/2605.17021#S3.T1.9.7.26.19.1),[§4\.1](https://arxiv.org/html/2605.17021#S4.SS1.p2.1)\.
- W\. Wang, R\. Arora, K\. Livescu, and J\. Bilmes \(2015\)On deep multi\-view representation learning\.InInternational Conference on Machine Learning,pp\. 1083–1092\.Cited by:[2nd item](https://arxiv.org/html/2605.17021#A1.I4.i2.p1.1),[§4\.4](https://arxiv.org/html/2605.17021#S4.SS4.p1.1)\.
- T\. Xia, T\. Dang, J\. Han, L\. Qendro, and C\. Mascolo \(2024\)Uncertainty\-aware health diagnostics via class\-balanced evidential deep learning\.IEEE Journal of Biomedical and Health Informatics\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.
- C\. Xu, J\. Si, Z\. Guan, W\. Zhao, Y\. Wu, and X\. Gao \(2024\)Reliable conflictive multi\-view learning\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 16129–16137\.Cited by:[4th item](https://arxiv.org/html/2605.17021#A1.I4.i4.p1.1),[§A\.6](https://arxiv.org/html/2605.17021#A1.SS6.p1.4),[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1),[§3\.1](https://arxiv.org/html/2605.17021#S3.SS1.p3.7),[§3\.4](https://arxiv.org/html/2605.17021#S3.SS4.p5.10),[§4\.3](https://arxiv.org/html/2605.17021#S4.SS3.p1.1),[§4\.4](https://arxiv.org/html/2605.17021#S4.SS4.p1.1)\.
- Q\. Zhang, H\. Wu, C\. Zhang, Q\. Hu, H\. Fu, J\. T\. Zhou, and X\. Peng \(2023\)Provable dynamic fusion for low\-quality multimodal data\.InInternational Conference on Machine Learning,pp\. 41753–41769\.Cited by:[§2\.2](https://arxiv.org/html/2605.17021#S2.SS2.p1.1)\.

## Appendix AAppendix

In the supplemental material:

- •[A\.1](https://arxiv.org/html/2605.17021#A1.SS1)\. Complementarity of Multimodal Sleep Signals\.
- •[A\.2](https://arxiv.org/html/2605.17021#A1.SS2)\. Process of Feature Extraction\.
- •[A\.3](https://arxiv.org/html/2605.17021#A1.SS3)\. Network Architecture of the Evidential DNN\.
- •[A\.4](https://arxiv.org/html/2605.17021#A1.SS4)\. Feasibility of the Mapping Strategy\.
- •[A\.5](https://arxiv.org/html/2605.17021#A1.SS5)\. Proof of Subjective Logic Constraints\.
- •[A\.6](https://arxiv.org/html/2605.17021#A1.SS6)\. Limitations of the Proposed CMAM\.
- •[A\.7](https://arxiv.org/html/2605.17021#A1.SS7)\. Sleep Datasets\.
- •[A\.8](https://arxiv.org/html/2605.17021#A1.SS8)\. Compared Methods\.
- •[A\.9](https://arxiv.org/html/2605.17021#A1.SS9)\. Implementation Details and Evaluation Metrics\.
- •[A\.10](https://arxiv.org/html/2605.17021#A1.SS10)\. Conflicts Among View\-Specific Opinions\.
- •[A\.11](https://arxiv.org/html/2605.17021#A1.SS11)\. Multi\-View Benchmark\.
- •[A\.12](https://arxiv.org/html/2605.17021#A1.SS12)\. Supplementary Figures\.

### A\.1Complementarity of Multimodal Sleep Signals

As shown in Fig\.[4](https://arxiv.org/html/2605.17021#A1.F4), EEG is commonly used for automated sleep staging tasks, with its distinctive features including theta waves \(4–8 Hz\) during theN​1N1sleep stage, sleep spindles and K\-complex waves \(12–14 Hz\) in theN​2N2stage, and delta waves \(0\.5–4 Hz\) in theN​3N3stage\. Furthermore, as a modality that complements EEG\-based methods, EOG exhibits significant differences during theR​E​MREMstage due to the characteristics of eye movements\.

### A\.2Process of Feature Extraction

High\-Level Feature Extraction\.Feature extraction aims to transform raw inputs into high\-level representations\. As shown in Fig\.[5](https://arxiv.org/html/2605.17021#A1.F5), ConfSleepNet first employs two parallel branches with different kernel sizes to capture both temporal and frequency information from the raw inputxx\. Subsequently, a global context attention module called GCNet\-1D is used to further enhance the features intohlh^\{l\}andhsh^\{s\}\. It is worth noting that GCNet was originally designed to capture global understanding of visual scenes and has demonstrated superior performance\(Caoet al\.,[2019](https://arxiv.org/html/2605.17021#bib.bib57)\)\. In this work, we modify it to adapt to the processing of one\-dimensional signals\. The process of basic feature extraction can be formalized as:

hs=G​C​Ns​\(C​N​Ns​\(x\)\)h^\{s\}=GCN^\{s\}\(CNN^\{s\}\(x\)\)\(14\)hl=G​C​Nl​\(C​N​Nl​\(x\)\)h^\{l\}=GCN^\{l\}\(CNN^\{l\}\(x\)\)\(15\)whereC​N​NCNNandG​C​NGCNdenote the processing through convolutional operations and the GCNet\-1D module, respectively, and the inputxxis either EEG or EOG\.

After basic feature extraction, we employ a cross\-attention mechanism to dynamically learn latent information from the interactions between the enhanced featureshlh^\{l\}andhsh^\{s\}, generating features denoted ashs​lh^\{sl\}andhl​sh^\{ls\}\. In addition, two shortcut connections are used to enable residual transmission of the enhanced featureshlh^\{l\}andhsh^\{s\}\. Finally, these features are combined into a unified representation byF=hs​‖hl‖​hs​l∥hl​sF=h^\{s\}\\\|h^\{l\}\\\|h^\{sl\}\\\|h^\{ls\}, where∥\\\|denotes feature concatenation\.

Modeling Stage\-Transition Patterns\.According to the AASM guideline\(Berryet al\.,[2012](https://arxiv.org/html/2605.17021#bib.bib59)\), sleep stage classification requires contextual information from neighboring epochs\. Therefore, the model should be capable of learning temporal dependencies\. However, the aforementioned high\-level feature extraction module focuses on intra\-epoch information and neglects inter\-epoch stage\-transition patterns\. This is because convolutional neural networks are not advantageous for modeling inter\-epoch dependencies\. Consequently, a Bi\-LSTM layer is incorporated into ConfSleepNet to model the transition patterns across a sequence of sleep epochs\.

### A\.3Network Architecture of the Evidential DNN

We use two types of networks to handle different inputs\. For single\-view EEG and single\-view EOG inputs, as shown in the Fig\.[6](https://arxiv.org/html/2605.17021#A1.F6)\(a\),f1​\(⋅\)f\_\{1\}\(\\cdot\)andf4​\(⋅\)f\_\{4\}\(\\cdot\)first extract high\-level representations from the raw input data, and then utilize a Bi\-LSTM layer to capture stage\-transition patterns between epochs\. The multi\-view network is designed for multi\-view inputs, and its difference from the single\-view network lies in the use of two independent feature extractors\. As illustrated in the Fig\.[6](https://arxiv.org/html/2605.17021#A1.F6)\(b\), two parallel branches are used to process EEG and EOG inputs, respectively, and fusion is performed after feature extraction\.

After obtaining the high\-level representation of the sleep input, different from deep learning methods, we use an output layer to generate non\-negative evidence vectors\. Specifically, a fully connected layer first transforms the high\-dimensional features into evidence vectors of fixed size \(typically the total number of classes\)\. Then, theSoftPlusactivation function is applied to ensure the non\-negativity of the evidence\.

### A\.4Feasibility of the Mapping Strategy\.

We adopt a uniform mapping strategy to convert coarse\-grained evidence into fine\-grained evidence\. To validate the feasibility of this mapping strategy, as shown in the Table[4](https://arxiv.org/html/2605.17021#A1.T4), we developed two variants and conducted comparative experiments on the MASS\-SS3 dataset\. The two variants are learnable mapping and data\-driven mapping\.

Table 4:Comparison of different evidence mapping strategies on the MASS\-SS3 dataset\.The experimental results show that the performance differences among the three mapping strategies are minimal, indicating that the uniform assignment hypothesis does not introduce significant bias in the current dataset\. This can be attributed to the following two reasons\. First, the evidence mapping is performed after the training of the coarse\-grained DNNs, where theN​R​E​MNREMevidence has already encoded the characteristics of the entireN​R​E​MNREMcategory\. Second, the subsequent conflict\-aware aggregation and multi\-view joint training compensate, to some extent, for the simplification introduced by the mapping layer\. Ultimately, considering the trade\-off between model performance and training time, we select the uniform mapping as the evidence mapping strategy\.

### A\.5Proof of Subjective Logic Constraints

Proof\.

∑kbka​▽¯​b\+ua​▽¯​b\\displaystyle\\sum\_\{k\}b^\{a\\bar\{\\triangledown\}b\}\_\{k\}\+u^\{a\\bar\{\\triangledown\}b\}=\\displaystyle=ua​\(1−ub\)\+ub​\(1−ua\)\+\(1−C\)​ua​ub​\(2−ua−ub\)ua\+ub\\displaystyle\\frac\{u^\{a\}\(1\-u^\{b\}\)\+u^\{b\}\(1\-u^\{a\}\)\+\(1\-C\)u^\{a\}u^\{b\}\(2\-u^\{a\}\-u^\{b\}\)\}\{u^\{a\}\+u^\{b\}\}\+\(1−C\)​ua​ub​\(ua\+ub\)\+2​C​ua​ubua\+ub\\displaystyle\+\\frac\{\(1\-C\)u^\{a\}u^\{b\}\(u^\{a\}\+u^\{b\}\)\+2Cu^\{a\}u^\{b\}\}\{u^\{a\}\+u^\{b\}\}=\\displaystyle=ua\+ub−2​ua​ub\+2​\(1−C\)​ua​ub\+2​C​ua​ubua\+ub\\displaystyle\\frac\{u^\{a\}\+u^\{b\}\-2u^\{a\}u^\{b\}\+2\(1\-C\)u^\{a\}u^\{b\}\+2Cu^\{a\}u^\{b\}\}\{u^\{a\}\+u^\{b\}\}=\\displaystyle=1\\displaystyle 1

### A\.6Limitations of the Proposed CMAM

Consistent with existing EDL\-based multi\-view aggregation methods\(Hanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib54); Xuet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib108)\), CMAM suffers from performance limitations due to its failure to satisfy the commutativity law \(e\.g\.,\(ℳ1​▽¯​ℳ2\)​▽¯​ℳ3≠ℳ1​\(▽¯​ℳ2​▽¯​ℳ3\)\(\\mathcal\{M\}^\{1\}\\bar\{\\triangledown\}\\mathcal\{M\}^\{2\}\)\\bar\{\\triangledown\}\\mathcal\{M\}^\{3\}\\neq\\mathcal\{M\}^\{1\}\(\\bar\{\\triangledown\}\\mathcal\{M\}^\{2\}\\bar\{\\triangledown\}\\mathcal\{M\}^\{3\}\)\)\. Specifically, the degree of conflictCCdepends on the specific pair of opinions being combined\. When fusing opinions generated from three or more views, the initial fusion \(i\.e\.,ℳ1​▽¯​ℳ2\\mathcal\{M\}^\{1\}\\bar\{\\triangledown\}\\mathcal\{M\}^\{2\}\) alters the original belief mass distribution, which in turn affects the degree of conflict withℳ3\\mathcal\{M\}^\{3\}\. Although the associativity law does not hold, the definition in Eq\. \([10](https://arxiv.org/html/2605.17021#S3.E10)\) remains valid because we adopt a fixed order for pairwise fusion, thereby ensuring a deterministic result\. In summary, this limitation will be thoroughly investigated in future work\.

### A\.7Datasets

Table 5:A summary of datasets\.SleepEDF\-20\.This dataset includes 20 healthy participants \(10 males and 10 females\) aging from 25 to 34 years\(Kempet al\.,[2000](https://arxiv.org/html/2605.17021#bib.bib52)\)and manually labeled according to the R&K manual\. For each subject, two consecutive day\-night PSG recordings are collected, except for subject 13 who has one night’s data lost due to device failure\. This dataset contains two EEG channels \(Fpz\-Cz and Pz\-Oz\) and a horizontal EOG channel\. The sampling rate for all EEG and EOG signals is 100 Hz\. Following previous studies\(Phanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib61),[2019](https://arxiv.org/html/2605.17021#bib.bib74); Phyoet al\.,[2023](https://arxiv.org/html/2605.17021#bib.bib72)\), we utilize the Fpz\-Cz EEG and EOG channels, and mergeN​3N3andN​4N4into a singleN​3N3stage based on the AASM rule\.

SleepEDF\-78\.As an expanded version of SleepEDF\-20\(Goldbergeret al\.,[2000](https://arxiv.org/html/2605.17021#bib.bib51)\), SleepEDF\-78 contains a total of 78 subjects ranging in age from 25 to 101 years, and comprises 153 whole\-night PSG sleep recordings\. The other settings are the same as SleepEDF\-20\.

MASS\-SS3\.This dataset is composed of 62 nights from healthy subjects\. Each recording contains 20 EEG channels and 2 EOG channels\. Manual annotation is performed by sleep experts according to the AASM standard\. All EEG and EOG signals have a sampling rate of 256 Hz\. We employ the C4\-LER EEG channel and the Left Horiz EOG channel, and downsample the signals to 128 Hz\.

SHHS\.The SHHS dataset\(Quanet al\.,[1997](https://arxiv.org/html/2605.17021#bib.bib184)\)is a large\-scale, multi\-center dataset designed to investigate the association between sleep\-disordered breathing and cardiovascular disease\. Since the participants in this dataset were drawn from multiple existing epidemiological cohort studies, to reduce the influence of various disease\-related factors, we followed the subject selection criteria proposed byFonsecaet al\.\([2016](https://arxiv.org/html/2605.17021#bib.bib185)\)\. and selected 329 individuals with relatively healthy sleep patterns \(i\.e\., an Apnea\-Hypopnea Index below 5\)\. For this filtered subset, we employed the C4\-A1 EEG channel with a sampling rate of 125 Hz, as well as the EOG channel, in our experiments\.

### A\.8Compared Methods

We compared our model with the following eight baselines:

- •DeepSleepNet\(Suprataket al\.,[2017](https://arxiv.org/html/2605.17021#bib.bib73)\)is an automatic sleep staging model based on single\-channel EEG\. It constructs a convolutional neural network \(CNN\) to extract time\-domain features from raw signals and employs a Bi\-LSTM network to automatically learn transition rules between sleep stages from EEG epochs\.
- •TinySleepNet\(Supratak and Guo,[2020](https://arxiv.org/html/2605.17021#bib.bib76)\)is an end\-to\-end model capable of performing automatic sleep staging with relatively limited training data and computational resources\. Through data augmentation, the model becomes more robust to temporal shifts and avoids overfitting to the sequence order of sleep stages\.
- •SalientSleepNet\(Jiaet al\.,[2021](https://arxiv.org/html/2605.17021#bib.bib85)\)is a temporal fully convolutional network based on the U2\-Net architecture\(Qinet al\.,[2020](https://arxiv.org/html/2605.17021#bib.bib9)\), consisting of two independent U2\-shaped streams that extract salient features from multimodal data\. It’s designed with multi\-scale extraction and multimodal attention modules help the model achieve excellent performance\.
- •XSleepNet\(Phanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib61)\)is a sequence\-to\-sequence sleep staging model capable of learning joint representations from raw signals and time\-frequency images\. Its characteristic lies in preserving the representational capacity of different views while enhancing robustness to training data\.
- •SleePyCo\(Leeet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib101)\)employs a feature pyramid backbone network to extract multi\-scale temporal and frequency features from raw EEG signals\. Furthermore, through supervised contrastive learning, the method reduces the distance between features of the same class and increases the distance between features of different classes, demonstrating particularly strong discriminative ability for theN​1N1andR​E​MREMsleep stages\.
- •FlexibleSleepNet\(Renet al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib186)\)is a lightweight classification model based on adaptive feature extraction \(AFE\) and scaling variation compression \(SVC\)\. AFE and SVC compress and expand the dimensions of features captured from multi\-channel data, enabling the network to effectively learn spatiotemporal dependencies across channels\.
- •TMCEK\(Lianget al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib11)\)is a trusted multi\-view classification method that has achieved notable results in automatic sleep staging tasks\. It integrates expert knowledge to improve feature interpretability and introduces a distribution\-aware subjective opinion mechanism for more reliable confidence estimation\.
- •HMDT\-Net\(Wanget al\.,[2026](https://arxiv.org/html/2605.17021#bib.bib187)\)aims to mitigate the impact of inter\-subject variability on sleep staging models\. The method proposes a trusted fusion strategy to effectively integrate heterogeneous multimodal data, while incorporating adversarial learning to extract common feature representations across subjects\.

### A\.9Implementation Details and Evaluation Metrics

Implementation Details\.The proposed ConfSleepNet model and its variants were implemented using the PyTorch framework \(version 1\.9\), and trained on an Nvidia GeForce RTX 3090 GPU with a total number of 100 training epochs\. The models were trained with a batch size of 16, and the Adam optimizer\(Kingma and Ba,[2014](https://arxiv.org/html/2605.17021#bib.bib50)\)is employed with a learning rate of 1e\-3 for the loss function defined in Eq\. \([13](https://arxiv.org/html/2605.17021#S3.E13)\)\. The multi\-view input signals were segmented into sequences of 20 epochs \(i\.e\., each sequence is composed of 20 sleep epochs\)\. We evaluated our proposed model by using a k\-fold cross\-validation scheme, where k was set to 20, 10, 31, and 5 for SleepEDF\-20, SleepEDF\-78, MASS\-SS3, and SHHS datasets, respectively\. In each fold, we selected one group of subjects as testing data, one group for validation, and the remaining groups for training\. For example, subject\-wise 20\-fold cross\-validation on SleepEDF\-20 with 20 subjects was a leave\-one\-subject\-out cross\-validation\.

Evaluation Metrics\.We employed two performance metrics, accuracy \(Acc\) and macro\-averaged F1\-score \(MF1\), to evaluate the model\. They are defined as follows:

A​c​c=T​P\+T​NT​P\+T​N\+F​P\+F​NAcc=\\frac\{TP\+TN\}\{TP\+TN\+FP\+FN\}\(16\)M​F​1=∑i=1KF​1iKMF1=\\frac\{\{\\sum\\limits\_\{i=1\}^\{K\}\{F1\_\{i\}\}\}\}\{K\}\(17\)whereT​PTPis the number of true positives,T​NTNis the number of true negatives,F​PFPis the number of false positives, andF​NFNis the number of false negatives\.F​1=2×P​r​e×R​e​xP​r​e\+R​e​cF1=\\frac\{2\\times Pre\\times Rex\}\{Pre\+Rec\},R​e​c=T​PT​P\+F​NRec=\\frac\{TP\}\{TP\+FN\},P​r​e=T​PT​P\+F​PPre=\\frac\{TP\}\{TP\+FP\}, andKKdenotes the number of sleep stage classes\. We further computed per\-class metrics by considering a single class as a positive class and all other classes combined as a negative class\.

### A\.10Conflicts Among View\-Specific Opinions

We calculated the sample distribution of a subject under different conflict degrees, as well as the proportions of different conflict intervals during sleep transition and non\-transition periods\. The results are presented in Tables[6](https://arxiv.org/html/2605.17021#A1.T6)and[7](https://arxiv.org/html/2605.17021#A1.T7)\.

Table 6:Sample distribution of a subject in MASS\-SS3 under different conflict degrees\.Table 7:Proportions of different conflict intervals in transition and non\-transition periodsStatistical results show that nearly half \(43\.27%\) of high\-conflict instances occur during the transition period, a proportion significantly higher than that of other conflict levels, indicating a strong correlation between high conflict and the transition period\. The main reasons can be summarized as follows: \(1\) At the signal feature level, the transition between adjacent sleep stages involves mixed physiological features, to which EEG and EOG have different sensitivities; \(2\) The model inherently possesses uncertainty when modeling the boundaries of sleep stages; \(3\) Accurately identifying the transition period is also challenging for human experts, meaning that subjective judgment discrepancies most easily occur in this context, reflecting the inherent difficulty of the sleep stage classification task, a viewpoint supported by existing related works\(Chenet al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib190)\)\.

### A\.11Multi\-View Benchmark

Multi\-View Dataset\.We used four multi\-view datasets: HandWritten555[https://archive\.ics\.uci\.edu/dataset/72/multiple\+features](https://archive.ics.uci.edu/dataset/72/multiple+features), Scene15666[https://figshare\.com/articles/dataset/15\-Scene\_Image\_Dataset/7007177/1](https://figshare.com/articles/dataset/15-Scene_Image_Dataset/7007177/1), CUB777[https://www\.vision\.caltech\.edu/visipedia/CUB\-200\.html](https://www.vision.caltech.edu/visipedia/CUB-200.html), and PIE888[http://www\.cs\.cmu\.edu/afs/cs/project/PIE/MultiPie/Home\.html](http://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Home.html)\. Similar to previous work\(Hanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib54)\), we extracted multi\-view features from different datasets, with detailed descriptions of each dataset provided below:

- •HandWrittendataset consists of handwritten numerals \(0–9\) extracted from Dutch utility maps, with 200 instances per class \(2,000 samples in total\), where these numerals were converted into binary images and characterized using six feature sets\.
- •Scene15dataset was designed specifically for image scene classification tasks and contains 4,485 images spanning 15 common indoor and outdoor scene categories, serving as a widely adopted benchmark for comparing various algorithms\.
- •CUBdataset is the most widely used dataset for fine\-grained visual classification tasks, comprising 11,788 images belonging to 200 bird subcategories\.
- •PIEdataset contains 680 facial images from 68 subjects\.

Baseline Methods\.We used six competitive baselines to evaluate CMAM:

- •EDL\(Sensoyet al\.,[2018](https://arxiv.org/html/2605.17021#bib.bib58)\)explicitly models the predictive uncertainty of a model using subjective logic theory\. Specifically, it imposes a Dirichlet distribution over class probabilities, thereby transforming the neural network’s predictions into subjective opinions\. This model has achieved notable success in out\-of\-distribution sample detection and exhibits strong robustness to interference\.
- •DCCAE\(Wanget al\.,[2015](https://arxiv.org/html/2605.17021#bib.bib188)\)builds upon deep canonical correlation analysis \(DCCA\) by introducing an autoencoder’s reconstruction loss as a regularization term, effectively combining both objectives\. This approach performs prominently in tasks that require both cross\-view prediction capability and the preservation of single\-view information\.
- •ETMC\(Hanet al\.,[2022](https://arxiv.org/html/2605.17021#bib.bib54)\)is a trustworthy multi\-view classification method that dynamically integrates different views at the evidence level to enhance classification reliability\. This framework accurately identifies predictive uncertainty while endowing the model with robustness against potential noise\.
- •RCML\(Xuet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib108)\)addresses a novel problem of reliable conflict multi\-view learning, which requires models to provide trustworthy decision results for conflicting multi\-view data\. To this end, it develops an evidential conflict multi\-view learning approach that aggregates view\-specific opinions through a conflicting opinion aggregation strategy\.
- •CCML\(Liuet al\.,[2024](https://arxiv.org/html/2605.17021#bib.bib189)\)proposes a consistency and complementarity\-aware multi\-view learning method\. It dynamically decouples consistent and complementary evidence, which is then processed using corresponding principles\.
- •TMCEK\(Lianget al\.,[2025](https://arxiv.org/html/2605.17021#bib.bib11)\)integrates expert knowledge with a distribution\-aware subjective opinion mechanism, thereby enhancing feature interpretability while achieving more reliable confidence estimation\.

Uncertainty Evaluation\.To further evaluate the advantage of CMAM in uncertainty awareness, we visualized the data density distributions under different noise levels \(σ\\sigma\) on two multi\-view datasets, as illustrated in the Fig[7](https://arxiv.org/html/2605.17021#A1.F7)\. The experimental results show that, compared with the clean datasets, the density of high\-uncertainty distributions is more prominent on the noisy datasets\. This demonstrates that CMAM can effectively identify noisy instances and estimate their uncertainty\.

### A\.12Supplementary Figures

![Refer to caption](https://arxiv.org/html/2605.17021v1/x7.png)Figure 4:Stage\-related features in EEG and EOG signals\.![Refer to caption](https://arxiv.org/html/2605.17021v1/x8.png)Figure 5:Architecture of feature extraction that utilizes two branches with varying kernel sizes and a cross\-attention mechanism\.![Refer to caption](https://arxiv.org/html/2605.17021v1/x9.png)Figure 6:The Network Architecture of the Evidential DNN\.![Refer to caption](https://arxiv.org/html/2605.17021v1/x10.png)\(a\)σ=0\.1\\sigma=0\.1
![Refer to caption](https://arxiv.org/html/2605.17021v1/x11.png)\(b\)σ=0\.5\\sigma=0\.5
![Refer to caption](https://arxiv.org/html/2605.17021v1/x12.png)\(c\)σ=1\\sigma=1
![Refer to caption](https://arxiv.org/html/2605.17021v1/x13.png)\(d\)σ=10\\sigma=10

Figure 7:Density of uncertainty on the PIE dataset\. As the noise intensity increases, the uncertainty curves of conflicting instances also increase\.

Similar Articles

Staging by the Book: Automatic Sleep Stage Classification Using Scoring Rules

arXiv cs.AI

This paper presents a deterministic, rule-based sleep staging method that explicitly implements the American Academy of Sleep Medicine (AASM) scoring rules, providing epoch-level natural language explanations. It achieves 60.5% epoch-level agreement with a majority-vote consensus on 50 polysomnography recordings, offering transparency as a complement to opaque deep learning models.