Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection

arXiv cs.LG 06/02/26, 04:00 AM Papers
Summary
This paper introduces Score-Guided Classification (SGC), a framework that models pathological priors using an unsupervised generative network for EEG-based depression detection, avoiding synthetic data augmentation and improving classification accuracy.
arXiv:2606.00180v1 Announce Type: new Abstract: Deep learning-based Major Depressive Disorder (MDD) detection using Electroencephalography (EEG) is fundamentally constrained by the "small-sample dilemma." Prevailing generative data augmentation methods not only incur heavy computational overhead but also risk introducing synthetic noise, thereby blurring classification boundaries. To challenge the traditional "data quantity first" convention, we propose a novel framework "Beyond Augmentation": Score-Guided Classification (SGC). SGC does not synthesize pseudo-samples; instead, it utilizes an unsupervised generative network architecture to model the structural and statistical anomaly degrees of samples, serving as the core "Pathological Prior". This prior, after robust normalization, is explicitly fused with deep feature representations, thereby precisely guiding the classifier's decision boundary. Furthermore, to dynamically adapt to varying channel configurations, we propose a Cross-Channel Spatial Adaptation module, utilizing a spatial mapping mechanism to effectively resolve the hardware heterogeneity of mismatched channels in multi-center datasets. Extensive experiments on the Mumtaz2016 and high-density MODMA datasets demonstrate the effectiveness and exceptional generalizability of our method under the challenging "zero data augmentation" setting and at "zero sample synthesis cost". Keywords: Electroencephalography (EEG), Depression Detection, Anomaly Score, Diffusion Models, Few-Shot Learning
Original Article
View Cached Full Text
Cached at: 06/02/26, 03:39 PM
# Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection
Source: [https://arxiv.org/html/2606.00180](https://arxiv.org/html/2606.00180)
,Jingqi ChengSchool of Internet, Anhui UniversityHefeiChina[shinz1804@163\.com](https://arxiv.org/html/2606.00180v1/mailto:[email protected]),Xu ZhaoSchool of Internet, Anhui UniversityHefeiChina,Wan JiangSchool of Computer Science and Technology, Hefei University of TechnologyHefeiChinaandJingjing WuSchool of Computer Science and Information Engineering, Hefei University of TechnologyHefeiChina

\(2026\)

###### Abstract\.

Deep learning\-based Major Depressive Disorder \(MDD\) detection using Electroencephalography \(EEG\) is fundamentally constrained by the small\-sample dilemma\. Prevailing generative data augmentation methods not only incur heavy computational overhead but also risk introducing synthetic noise, thereby blurring classification boundaries\. To challenge the traditional “data quantity first” convention, we propose a novel framework:Score\-Guided Classification \(SGC\)\. SGC does not synthesize pseudo\-samples; instead, it utilizes an unsupervised generative network architecture to model the structural and statistical anomaly degrees of samples, serving as the core pathological prior\. This prior, after robust normalization, is explicitly fused with deep feature representations, thereby precisely guiding the classifier’s decision boundary\. Furthermore, to dynamically adapt to varying electrode configurations, we propose a Cross\-electrode Spatial Adaptation module, utilizing a spatial mapping mechanism to effectively resolve the hardware heterogeneity of mismatched electrodes in multi\-center datasets\. Extensive experiments on the Mumtaz2016 and high\-density MODMA datasets demonstrate the effectiveness and exceptional generalizability of our method under the challenging zero data augmentation setting and at zero sample synthesis cost\.

Electroencephalography \(EEG\), Depression Detection, Anomaly Score, Diffusion Models, Few\-Shot Learning

††copyright:none††copyright:acmlicensed††journalyear:2026††doi:10\.1145/nnnnnnn\.nnnnnnn††conference:The 34th ACM International Conference on Multimedia; 2026; TBD††isbn:978\-1\-4503\-XXXX\-X/2026††submissionid:4768††ccs:Computing methodologies Anomaly detection††ccs:Applied computing Health informatics## 1\.Introduction

![Refer to caption](https://arxiv.org/html/2606.00180v1/x1.png)Figure 1\.Generative flaws in standard diffusion\-based augmentation\. Compared to authentic EEG \(top/middle\), generated pseudo\-signals exhibit severe high\-frequency artifacts and morphological distortion\. This degradation is quantitatively confirmed \(bottom\) by a stark distribution shift \(Δσ=0\.73\\Delta\\sigma=0\.73\) in the KDE plot, demonstrating that synthetic samples fail to preserve the intrinsic statistical manifolds of neural activity\.Developing robust computational architectures to decode Electroencephalography \(EEG\) signals is emerging as a critical frontier for the objective diagnosis of Major Depressive Disorder \(MDD\)\(Xuet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib1)\)\. This paradigm fundamentally addresses the limitations of traditional psychiatric interviews, which are inherently vulnerable to subjective bias and inter\-rater variability\. By offering exceptional temporal resolution, EEG provides a direct, non\-invasive window into latent cortical dynamics\(Leccisottiet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib2)\)\. Therefore, the task of EEG\-based depression detection has attracted widespread attention in psychiatric informatics and clinical screening\.

To process these complex non\-stationary signals, contemporary EEG\-based MDD detection methodologies generally fall into three representation paradigms\. Temporal sequence modeling utilizes 1D Convolutional Neural Networks \(1D\-CNNs\)\(Sealet al\.,[2021](https://arxiv.org/html/2606.00180#bib.bib69)\)or Recurrent Neural Networks \(RNNs\)\(Ayet al\.,[2019](https://arxiv.org/html/2606.00180#bib.bib62)\)to directly extract longitudinal morphological features from raw 1D EEG series\. Spatial image transformations map EEG signals into 2D spectrograms or topographical representations to leverage the powerful visual feature extraction capabilities of 2D Convolutional Neural Networks \(2D\-CNNs\)\(Sharmaet al\.,[2023](https://arxiv.org/html/2606.00180#bib.bib74)\)\.Lastly, spatio\-temporal and topological modeling employs Graph Neural Networks \(GNNs\)\(Luoet al\.,[2023](https://arxiv.org/html/2606.00180#bib.bib70)\)or Transformers to jointly capture dynamic temporal dependencies and complex cross\-electrode functional connectivity\(Chen and Yang,[2025](https://arxiv.org/html/2606.00180#bib.bib75)\)\. By autonomously mining these hierarchical representations, these advanced architectures have significantly elevated the diagnostic performance of automated depression detection\.

Despite their impressive performance improvements, these purely end\-to\-end supervised paradigms exhibit critical vulnerabilities in real\-world clinical applications, primarily due to their susceptibility to shortcut learning\. EEG signals are inherently plagued by extreme non\-stationarity, pronounced inter\-subject variability, and exceedingly low signal\-to\-noise ratios\(Lyu,[2026](https://arxiv.org/html/2606.00180#bib.bib5)\)\. Consequently, mapping these chaotic signals directly to discrete labels forces parameter\-heavy networks to degrade into uninterpretable black boxes\(Hamedet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib6)\)\. Without explicit pathological prior guidance, models tend to overfit subject\-specific physiological artifacts rather than isolating universal neurophysiological biomarkers\. This overfitting to idiosyncratic noise yields highly brittle decision boundaries, frequently causing a catastrophic degradation in cross\-center generalization\(Lawalet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib7); Wonget al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib8)\)\.

Exacerbating these representational bottlenecks is the severe small\-sample dilemma endemic to psychiatric EEG datasets, which typically comprise fewer than 100 subjects\(Hallalet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib9)\)\. To circumvent this data scarcity, recent literature has increasingly turned to deep generative models, including Graph Neural Networks \(GNNs\)\(Luoet al\.,[2023](https://arxiv.org/html/2606.00180#bib.bib70)\)and Diffusion Models\(Zhaoet al\.,[2024](https://arxiv.org/html/2606.00180#bib.bib40); Zhouet al\.,[2023](https://arxiv.org/html/2606.00180#bib.bib41)\), for brute\-force data augmentation\(Mirowski and Fabijańska,[2026](https://arxiv.org/html/2606.00180#bib.bib10)\)\. However, the direct synthesis of high\-dimensional, non\-stationary raw EEG signals introduces fatal methodological flaws\. As compellingly demonstrated in Fig\.[1](https://arxiv.org/html/2606.00180#S1.F1), the directly synthesized pseudo\-signals are plagued by dense, high\-frequency generative artifacts and severe morphological distortions\(Zhanget al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib11)\)\. Crucially, these hallucinated samples exhibit a stark distribution shift \(Δσ=0\.73\\Delta\\sigma=0\.73\) away from the authentic data manifold\. Consequently, this noisy augmentation paradigm actively injects spurious correlations, misleading downstream classifiers into memorizing synthetic noise distributions rather than clinically meaningful biomarkers\(Choiet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib13)\)\.

To fundamentally circumvent the representational bottlenecks of discriminative black boxes and the catastrophic noise injection endemic to generative augmentation, we propose a novel framework that steps beyond augmentation:Score\-GuidedClassification \(SGC\)\. Distinct from existing paradigms that blindly hallucinate raw EEG, SGC redefines generative utility by modeling a normative electrophysiological baseline\. Specifically, we design an unsupervised dual\-stream generative pipeline trained exclusively on healthy control \(HC\) cohorts to anchor the intrinsic manifold of normal brain dynamics\. To comprehensively capture this normative baseline, the dual\-stream architecture is meticulously constructed: one stream explicitly models the discrete structural morphology of healthy signals, while the complementary stream captures their continuous probabilistic distribution\. For any input, SGC quantifies the structural reconstruction error and probabilistic distribution shift against this frozen template, distilling a fine\-grained anomaly score\(Rutherfordet al\.,[2022](https://arxiv.org/html/2606.00180#bib.bib18)\)\. Serving as a deterministic, noise\-free pathological prior, this continuous score is explicitly fused into the downstream deep classifier to dynamically calibrate the decision boundary, effectively isolating genuine MDD biomarkers from complex background variance\.

Beyond representational challenges, the real\-world clinical deployment of EEG\-based diagnostics is frequently bottlenecked by hardware\-level data heterogeneity, such as varying electrode configurations across clinical centers, ranging from 19 to 128 channels\. To equip the SGC architecture with hardware\-agnostic generalizability, we introduce a continuous spatial topology mapping strategy\. By geometrically projecting arbitrary high\-density raw signals onto a unified, standardized electrode layout, this mechanism maximally preserves the global biophysical topography of the brain\. Crucially, it eradicates the necessity for architecture\-level structural retraining, thereby unlocking seamless zero\-shot transfer and robust cross\-dataset adaptation across highly heterogeneous clinical environments\.

In summary, our contributions are:

❑We proposeScore\-Guided Classification \(SGC\), a novel augmentation\-free framework that models a normative EEG baseline from healthy controls via an unsupervised dual\-stream architecture, and exploits the resulting pathological prior to enhance downstream depression detection by reshaping feature manifolds and broadening decision margins\.

❑We introduce a spatial topology mapping strategy that projects mismatched electrodes onto a unified layout\. This resolves cross\-cohort hardware heterogeneity and unlocks seamless zero\-shot transfer without architectural retraining\.

❑Extensive evaluations on Mumtaz2016 and MODMA yield state\-of\-the\-art performance \(95\.19% accuracy\), empirically validating that extracting explicit pathological priors is fundamentally more robust than standard generative augmentation\.

![Refer to caption](https://arxiv.org/html/2606.00180v1/x2.png)Figure 2\.Overall architecture of the Score\-Guided Classification \(SGC\) framework\. Theunsupervised stream\(top\) exclusively models the normative healthy baseline using VQ\-VAE and DDPM\. Thesupervised stream\(bottom\) utilizes this frozen baseline to extract a pathological anomaly score \(S′S^\{\\prime\}\) for incoming samples, while in parallel capturing deep spatio\-temporal features \(FdeepF\_\{\\text\{deep\}\}\)\. Late\-stage feature fusion integratesS′S^\{\\prime\}to precisely guide the final decision boundary\.
## 2\.Related Work

### 2\.1\.EEG\-based Depression Detection

The landscape of automatic Major Depressive Disorder \(MDD\) detection has experienced a significant paradigm shift from traditional handcrafted spectral feature engineering to data\-driven deep learning architectures\. Recent literature has extensively explored various deep frameworks, including 1D Convolutional Neural Networks \(1D\-CNNs\), Graph Neural Networks \(GNNs\), and Transformer\-based models, to capture complex spatio\-temporal dynamics and cross\-channel topological dependencies within EEG signals\(Luet al\.,[2024](https://arxiv.org/html/2606.00180#bib.bib21); Penget al\.,[2025](https://arxiv.org/html/2606.00180#bib.bib22); Singhet al\.,[2024](https://arxiv.org/html/2606.00180#bib.bib23); Wanget al\.,[2024c](https://arxiv.org/html/2606.00180#bib.bib24); Houet al\.,[2025](https://arxiv.org/html/2606.00180#bib.bib25); Sunet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib26)\)\. Despite their impressive representational capabilities, these purely supervised, parameter\-heavy paradigms are notoriously susceptible to severe overfitting, particularly when deployed in the small\-sample regimes endemic to clinical psychiatric datasets\(Zhulduzbayevet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib27); Liuet al\.,[2022](https://arxiv.org/html/2606.00180#bib.bib28); Vaniyaet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib29); Olbrichet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib30)\)\. Consequently, they frequently exhibit substantial performance degradation in subject\-independent generalization scenarios\(Shenet al\.,[2025](https://arxiv.org/html/2606.00180#bib.bib31); Wanget al\.,[2024a](https://arxiv.org/html/2606.00180#bib.bib32); Liet al\.,[2024](https://arxiv.org/html/2606.00180#bib.bib33); Kimet al\.,[2025](https://arxiv.org/html/2606.00180#bib.bib34)\)\. This inherent limitation underscores the critical necessity for novel representation learning strategies that transcend the boundaries of simplistic end\-to\-end supervised classification\.

### 2\.2\.Data Augmentation for EEG Analysis

To mitigate the pervasive data scarcity in EEG\-based assessments, extensive research has been dedicated to data augmentation techniques, which broadly bifurcate into heuristic and generative approaches\. Traditional heuristic methods, such as random cropping, temporal masking, or signal flipping, attempt to expand the training manifold but frequently violate the intricate biophysical integrity of EEG recordings, inadvertently inducing semantic distortions and phase\-coupling disruptions\(Rommelet al\.,[2022](https://arxiv.org/html/2606.00180#bib.bib36); Lashgariet al\.,[2020](https://arxiv.org/html/2606.00180#bib.bib35); Liaoet al\.,[2025](https://arxiv.org/html/2606.00180#bib.bib37); Leeet al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib38)\)\. To circumvent these handcrafted limitations, recent advancements have increasingly leveraged deep generative architectures, notably Generative Adversarial Networks \(GANs\)\(Hartmannet al\.,[2018](https://arxiv.org/html/2606.00180#bib.bib39)\)and Denoising Diffusion Probabilistic Models \(DDPMs\)\(Zhaoet al\.,[2024](https://arxiv.org/html/2606.00180#bib.bib40); Zhouet al\.,[2023](https://arxiv.org/html/2606.00180#bib.bib41)\), to synthesize high\-fidelity pseudo\-samples\. However, deploying these generative frameworks in ultra\-low\-data scenarios introduces severe methodological bottlenecks\(Wanget al\.,[2026](https://arxiv.org/html/2606.00180#bib.bib44)\)\. As empirically demonstrated in our preliminary analysis \(see Fig\.[1](https://arxiv.org/html/2606.00180#S1.F1)\), diffusion\-based synthesis on limited EEG cohorts incurs pronounced high\-frequency noise amplification and a stark distribution shift \(Δσ\\Delta\\sigma\) away from the authentic data manifold\(Heet al\.,[2021](https://arxiv.org/html/2606.00180#bib.bib42); Linet al\.,[2024](https://arxiv.org/html/2606.00180#bib.bib43)\)\. Rather than sharpening decision boundaries, such noisy data expansion irreparably blurs them\. In contrast to these augmentation\-centric paradigms, our SGC framework fundamentally shifts the focus toward unsupervised feature excavation, completely bypassing the risks associated with hallucinated raw signal generation\.

### 2\.3\.Anomaly Detection as Pathological Priors

Conventional Anomaly Detection \(AD\) frameworks often reduce pathological deviations to rigid thresholds for outlier rejection\(Fernandoet al\.,[2021](https://arxiv.org/html/2606.00180#bib.bib45); An and Cho,[2015](https://arxiv.org/html/2606.00180#bib.bib46); Bauret al\.,[2018](https://arxiv.org/html/2606.00180#bib.bib47)\), underutilizing generative representations\. Their reliance on holistic reconstruction errors or shallow statistical distances is easily confounded by EEG’s extreme non\-stationarity and inter\-subject variability\. Departing from this rigid paradigm, SGC redefines anomaly scores as continuous pathological priors\. By synergizing structural quantization\(Van Den Oordet al\.,[2017](https://arxiv.org/html/2606.00180#bib.bib14)\)and probabilistic diffusion\(Hoet al\.,[2020](https://arxiv.org/html/2606.00180#bib.bib16)\), we construct a unified prior space isolating anomalies across complementary dimensions\. Specifically, quantization identifies morphological distortions in temporal microstates, while diffusion captures subtle statistical drifts in background dynamics\. This calibration dynamically reshapes the downstream decision boundary, transforming rudimentary outlier metrics into highly discriminative features for robust MDD diagnostics\.

## 3\.Methodology

### 3\.1\.Problem Formulation and Dual\-Stream Architecture

We formulate EEG\-based depression detection as a prior\-guided time\-series classification task\. Given a dataset𝒟=\{\(𝐱i,yi\)\}i=1N\\mathcal\{D\}=\\\{\(\\mathbf\{x\}\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{N\}, where𝐱i∈ℝC×T\\mathbf\{x\}\_\{i\}\\in\\mathbb\{R\}^\{C\\times T\}denotes an EEG segment comprisingCCchannels andTTtime steps, andyi∈\{0,1\}y\_\{i\}\\in\\\{0,1\\\}represents the ground\-truth label \(0 for HC, 1 for MDD\), our Score\-Guided Classification \(SGC\) framework aims to learn an optimized mapping functionℱ:𝐱i→y^i\\mathcal\{F\}:\\mathbf\{x\}\_\{i\}\\rightarrow\\hat\{y\}\_\{i\}\. Departing from conventional end\-to\-end paradigms, SGC dynamically calibrates the classifier’s decision boundary by explicitly injecting a robust pathological anomaly scoreS′∈ℝS^\{\\prime\}\\in\\mathbb\{R\}, which is derived independently via unsupervised generative modeling\.

As illustrated in Fig\.[2](https://arxiv.org/html/2606.00180#S1.F2), the SGC framework is operationalized through a synergistic dual\-stream architecture:

❑Unsupervised Stream \(Healthy Manifold & Distribution Modeling\): Trained exclusively on normative HC data, this offline generative branch establishes a standard structural manifold and a baseline noise distribution to completely anchor the intrinsic patterns of healthy neural dynamics\.

❑Supervised Stream \(Anomaly Scoring & Feature Fusion\): Operating on incoming samples, this discriminative pipeline first standardizes hardware heterogeneity via cross\-electrode spatial adaptation\. It then leverages the frozen unsupervised models to quantify the pathological deviation of the input \(yielding the anomaly scoreS′S^\{\\prime\}\), and explicitly fuses this score with deeply extracted spatiotemporal features to guide the final classification\.

The fundamental insight of this architecture lies in its late\-stage feature fusion\. By leveraging unsupervised generative models to anchor the intrinsic normative patterns, SGC constructs a deterministic, noise\-free pathological reference\. This continuous prior explicitly steers the final Multi\-Layer Perceptron \(MLP\) mapping, shielding the network from spurious correlations and ensuring highly robust MDD predictions\. The subsequent subsections detail the spatial adaptation strategy for hardware heterogeneity, the specific designs of these two streams, and the overall training optimization\.

### 3\.2\.Cross\-electrode Spatial Adaptation

Cross\-cohort EEG analysis is frequently hindered by heterogeneous electrode configurations, such as high\-density 128\-electrode arrays versus standard 19\-electrode clinical systems\.To ensure the broad generalizability of SGC without necessitating architectural modifications or costly structural retraining, we formulate a spatial topology mapping strategy\.

Specifically, we project the arbitrary high\-density signals𝐗orig∈ℝCorig×T\\mathbf\{X\}\_\{\\text\{orig\}\}\\in\\mathbb\{R\}^\{C\_\{\\text\{orig\}\}\\times T\}onto a standardized 19\-electrode spatial template \(following the International 10\-20 system\(Klem,[1999](https://arxiv.org/html/2606.00180#bib.bib48)\)\) viaSpherical Spline Interpolation\(Perrinet al\.,[1989](https://arxiv.org/html/2606.00180#bib.bib49)\)\. By mathematically mapping each electrode to a 3D coordinate𝐩\\mathbf\{p\}on a normalized spherical manifold representing the human scalp, the interpolated voltage at any target electrode𝐩target\\mathbf\{p\}\_\{target\}is estimated using the original source electrodes𝐩j\\mathbf\{p\}\_\{j\}:

\(1\)f\(𝐩target\)=∑j=1Corigcjg\(‖𝐩target−𝐩j‖\)f\(\\mathbf\{p\}\_\{\\text\{target\}\}\)=\\sum\_\{j=1\}^\{C\_\{\\text\{orig\}\}\}c\_\{j\}g\(\\\|\\mathbf\{p\}\_\{\\text\{target\}\}\-\\mathbf\{p\}\_\{j\}\\\|\)whereg\(⋅\)g\(\\cdot\)denotes the spherical spline Green’s function, andcjc\_\{j\}represents the spatial interpolation coefficients\. This geometric projection strictly enforces a consistent spatial dimensionality \(C=19C=19\) for the downstream dual\-stream network, thereby unlocking seamless cross\-dataset zero\-shot inference capabilities\.

### 3\.3\.Unsupervised Healthy Manifold and Distribution Modeling

We design unsupervised healthy manifold and distribution modeling branch to extract pathological prior from healthy samples\. To this end, it excavates normative EEG characteristics via a one\-class generative modeling paradigm\. Modeling the authentic healthy EEG manifold poses significant challenges due to the intrinsic duality of the signals, which comprise discrete structural microstates, such as quasi\-stable physiological waveforms, and continuous stochastic background dynamics\. Monolithic generative models, including vanilla VAEs\(Kingma and Welling,[2013](https://arxiv.org/html/2606.00180#bib.bib72)\)and GANs\(Goodfellowet al\.,[2014](https://arxiv.org/html/2606.00180#bib.bib73)\), typically fail to capture both facets simultaneously, frequently suffering from blurry reconstructions or mode collapse\. To overcome this representational bottleneck, we establish a robustHealthy Manifold\(Marquandet al\.,[2016](https://arxiv.org/html/2606.00180#bib.bib50)\)by training exclusively on Healthy Control \(HC\) data through a meticulously decoupled dual\-stream pipeline\.

To handle these issues, first, we employ a Vector\-Quantized Variational Autoencoder \(VQ\-VAE\)\(Van Den Oordet al\.,[2017](https://arxiv.org/html/2606.00180#bib.bib14); Yanget al\.,[2025](https://arxiv.org/html/2606.00180#bib.bib15)\)as a Manifold Learner\. Its discrete codebook mechanism is fundamentally suited to quantize and memorize the structural integrity of normative EEG patterns\. Complementarily, a Denoising Diffusion Probabilistic Model \(DDPM\)\(Hoet al\.,[2020](https://arxiv.org/html/2606.00180#bib.bib16); Pinayaet al\.,[2022](https://arxiv.org/html/2606.00180#bib.bib17)\)serves as aNoise Predictor\. Its continuous Markovian diffusion process is uniquely optimal for modeling the complex, high\-dimensional probability distribution of the stochastic background\. By establishing this dual baseline, the unsupervised stream perfectly encapsulates the normative electrophysiological state, laying the definitive foundation for quantifying pathological deviations in the subsequent supervised classification phase\.

#### 3\.3\.1\.Structure Reconstruction

To effectively encode the complex, non\-stationary temporal dynamics of EEG signals, we employ a 1D\-CNN\-based VQ\-VAE architecture\. The encoderE\(⋅\)E\(\\cdot\)projects the input signal𝐱∈ℝC×T\\mathbf\{x\}\\in\\mathbb\{R\}^\{C\\times T\}into a continuous latent space, yielding the representation𝐳e\\mathbf\{z\}\_\{e\}\. The quantizerQ\(⋅\)Q\(\\cdot\)then discretizes𝐳e\\mathbf\{z\}\_\{e\}into𝐳q\\mathbf\{z\}\_\{q\}via nearest\-neighbor mapping within a learnable codebook𝒞=\{𝐞k\}k=1K\\mathcal\{C\}=\\\{\\mathbf\{e\}\_\{k\}\\\}\_\{k=1\}^\{K\}\(Van Den Oordet al\.,[2017](https://arxiv.org/html/2606.00180#bib.bib14)\)\. Crucially, this discrete codebook \(K=512K=512, with a dimensionality of6464\) is optimized exclusively on HC data\. This constraint forces the network to map input signals into a finite set ofhealthy basis atoms, effectively constructing a robust structural “dictionary” of normative EEG patterns\.

The decoderD\(⋅\)D\(\\cdot\)subsequently reconstructs the signal from the quantized vector𝐳q\\mathbf\{z\}\_\{q\}\. To optimize this healthy manifold, the network is trained using the VQ\-VAE objectiveℒVQ\\mathcal\{L\}\_\{\\text\{VQ\}\}, which combines the Mean Squared Error \(MSE\) for reconstruction fidelity with a codebook commitment loss:

\(2\)ℒVQ=‖𝐱−D\(𝐳q\)‖22\+β‖sg\[𝐳e\]−𝐳q‖22\\mathcal\{L\}\_\{\\text\{VQ\}\}=\\\|\\mathbf\{x\}\-D\(\\mathbf\{z\}\_\{q\}\)\\\|\_\{2\}^\{2\}\+\\beta\\\|\\text\{sg\}\[\\mathbf\{z\}\_\{e\}\]\-\\mathbf\{z\}\_\{q\}\\\|\_\{2\}^\{2\}wheresg\[⋅\]\\text\{sg\}\[\\cdot\]denotes the stop\-gradient operator, and the commitment costβ\\betais empirically set to0\.250\.25\(Van Den Oordet al\.,[2017](https://arxiv.org/html/2606.00180#bib.bib14)\)\. By strictly bounding the optimization landscape to normative HC data,ℒVQ\\mathcal\{L\}\_\{\\text\{VQ\}\}is uniquely positioned to act as a rigorous metric: any subsequent out\-of\-distribution sample will inevitably incur a high reconstruction error when forced through this healthy dictionary\.

#### 3\.3\.2\.Distributional Probability Modeling

To complement the VQ\-VAE’s discrete representations, we independently train a Denoising Diffusion Probabilistic Model \(DDPM\)\(Hoet al\.,[2020](https://arxiv.org/html/2606.00180#bib.bib16)\)to map the continuous probabilistic landscape of healthy EEG signals\. The forward phase gradually corrupts a normative signal𝐱0\\mathbf\{x\}\_\{0\}by injecting Gaussian noise acrossT=500T=500steps\. At timesteptt, the perturbed state is:

\(3\)q\(𝐱t∣𝐱0\)=𝒩\(𝐱t;α¯t𝐱0,\(1−α¯t\)𝐈\),q\(\\mathbf\{x\}\_\{t\}\\mid\\mathbf\{x\}\_\{0\}\)=\\mathcal\{N\}\\left\(\\mathbf\{x\}\_\{t\};\\sqrt\{\\bar\{\\alpha\}\_\{t\}\}\\mathbf\{x\}\_\{0\},\(1\-\\bar\{\\alpha\}\_\{t\}\)\\mathbf\{I\}\\right\),where\{α¯t\}t=1T\\\{\\bar\{\\alpha\}\_\{t\}\\\}\_\{t=1\}^\{T\}dictates the variance schedule\.

Conversely, the reverse generative process trains a parameterized networkϵθ\\epsilon\_\{\\theta\}to estimate the injected noiseϵ\\epsilon\. To exclusively encapsulate the normative distribution, the network is optimized on HC data using the diffusion objectiveℒDiff\\mathcal\{L\}\_\{\\text\{Diff\}\}\(Hoet al\.,[2020](https://arxiv.org/html/2606.00180#bib.bib16)\):

\(4\)ℒDiff=𝔼t,𝐱0,ϵ∼𝒩\(0,𝐈\)\[‖ϵ−ϵθ\(𝐱t,t\)‖22\]\.\\mathcal\{L\}\_\{\\text\{Diff\}\}=\\mathbb\{E\}\_\{t,\\mathbf\{x\}\_\{0\},\\epsilon\\sim\\mathcal\{N\}\(0,\\mathbf\{I\}\)\}\\left\[\\\|\\epsilon\-\\epsilon\_\{\\theta\}\(\\mathbf\{x\}\_\{t\},t\)\\\|\_\{2\}^\{2\}\\right\]\.By minimizing this loss, the DDPM learns the inherent stochastic baseline of healthy neurodynamics\. Consequently,ℒDiff\\mathcal\{L\}\_\{\\text\{Diff\}\}establishes a rigorous foundation to measure implicit statistical shifts in subsequent out\-of\-distribution samples, perfectly complementing the VQ\-VAE\.

### 3\.4\.Supervised Stream: Anomaly Scoring and Feature Fusion

To fully exploit the complex spatiotemporal dynamics of EEG signals and explicitly leverage the learned normative baseline, we design a comprehensive supervised classification stream\. As depicted in the lower branch of our overall architecture, this phase shifts from offline generative modeling to online inference and classification\. It sequentially performs three synergistic operations: anomaly score extraction via the frozen generative models, deep feature representation using a hybrid local\-global backbone, and score\-guided feature fusion\.

Crucially, rather than relying solely on raw morphological representations \(FdeepF\_\{\\text\{deep\}\}\), we introduce a feature\-level fusion mechanism that explicitly injects the normalized anomaly score \(S′S^\{\\prime\}\) as a deterministic soft prior\. By concatenatingS′S^\{\\prime\}withFdeepF\_\{\\text\{deep\}\}, we construct an augmented, prior\-guided representation \(FfinalF\_\{\\text\{final\}\}\) to calibrate the final Multi\-Layer Perceptron \(MLP\) classifier\. This dual\-awareness mechanism explicitly forces the network to focus on discriminative pathological patterns rather than idiosyncratic background noise\.

#### 3\.4\.1\.Pathological Prior Incorporation \(Anomaly Scoring\)

Specifically, SGC leverages the normative baselines established in the unsupervised stream \(Section[3\.3](https://arxiv.org/html/2606.00180#S3.SS3)\)\. For an incoming EEG segment𝐱\\mathbf\{x\}, we employ the frozen pre\-trained VQ\-VAE and DDPM architectures to act as fixed reference templates\. By projecting𝐱\\mathbf\{x\}through these frozen models, we compute the inference\-stage structural errorℒVQ′\\mathcal\{L\}^\{\\prime\}\_\{\\text\{VQ\}\}\(via Eq\.[2](https://arxiv.org/html/2606.00180#S3.E2)\) and the statistical deviationℒDiff′\\mathcal\{L\}^\{\\prime\}\_\{\\text\{Diff\}\}\(via Eq\.[4](https://arxiv.org/html/2606.00180#S3.E4)\), respectively\. These metrics quantify the input’s pathological deviation from the learned healthy manifold and noise distribution\. To synthesize these explicit structural and implicit statistical deviations, we compute a composite anomaly scoreSS:

\(5\)S=αℒVQ′\+\(1−α\)ℒDiff′,S=\\alpha\\mathcal\{L\}^\{\\prime\}\_\{\\text\{VQ\}\}\+\(1\-\\alpha\)\\mathcal\{L\}^\{\\prime\}\_\{\\text\{Diff\}\},whereα=0\.7\\alpha=0\.7determined via a held\-out pilot split\. This higher weight favors the deterministic VQ\-VAE error over the stochastic DDPM loss, ensuring the stability of the final anomaly score\.

Because raw scores are highly susceptible to inter\-subject variability and extreme outliers, we apply robust normalization to projectSSinto a bounded final priorS′S^\{\\prime\}\(Truong and Delorme,[2025](https://arxiv.org/html/2606.00180#bib.bib51); El Kerdawyet al\.,[2020](https://arxiv.org/html/2606.00180#bib.bib52)\):

\(6\)S′=clip⁡\(S−P5P95−P5,0,1\),S^\{\\prime\}=\\operatorname\{clip\}\\left\(\\frac\{S\-P\_\{5\}\}\{P\_\{95\}\-P\_\{5\}\},0,1\\right\),whereP5P\_\{5\}andP95P\_\{95\}are training percentiles explicitly filtering the unpredictable 5% physiological noise tails\.

![Refer to caption](https://arxiv.org/html/2606.00180v1/x3.png)Figure 3\.Probability density distribution of the normalized anomaly scores \(S′S^\{\\prime\}\)\. The inevitable boundary overlap motivates the integration of this score as a soft prior rather than a brittle hard threshold\.As shown in Fig\.[3](https://arxiv.org/html/2606.00180#S3.F3), while the MDD cohort exhibits substantially higher scores, the intrinsic non\-stationarity of EEG induces inevitable boundary overlaps\. Consequently, rather than enforcing a rigid, error\-prone diagnostic threshold, we seamlessly embedS′S^\{\\prime\}into the network as a continuoussoft prior feature\(Choudhuryet al\.,[2025](https://arxiv.org/html/2606.00180#bib.bib53); Zhanget al\.,[2023](https://arxiv.org/html/2606.00180#bib.bib54)\)\. This mapping acts as an informative anchor, explicitly signaling the magnitude of pathological deviation to the downstream classifier\.

#### 3\.4\.2\.Hybrid Feature Extractor

The backbone extracts complex EEG characteristics through a hierarchical, two\-stage architecture\. Initially, a 1D\-CNN frontend is deployed to capture local temporal morphological features, such as high\-frequency oscillations and micro\-state peaks\. The raw input segment is iteratively updated via 1D convolution, Batch Normalization \(BN\) to mitigate internal covariate shifts\(Ioffe and Szegedy,[2015](https://arxiv.org/html/2606.00180#bib.bib55)\), ReLU activation, and Max Pooling to progressively broaden the temporal receptive field and filter redundant noise\.

Subsequently, the downsampled CNN feature maps are sequence\-projected and fed into a Transformer Encoder\. To comprehensively model the long\-range global dependencies across the entire temporal dimension, we employ the standard Multi\-Head Self\-Attention \(MHSA\) mechanism\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.00180#bib.bib56)\)\. Finally, a Global Average Pooling \(GAP\) layer aggregates the output token sequence into a compact, discriminative deep spatiotemporal feature vector, denoted as𝐅deep\\mathbf\{F\}\_\{\\text\{deep\}\}\.

#### 3\.4\.3\.Score\-Guided Feature Fusion

Unlike conventional classifiers relying solely on raw signals, SGC incorporates the normalized anomaly scoreS′S^\{\\prime\}as an explicitpathological prior\. Before the final Multi\-Layer Perceptron \(MLP\) head,S′S^\{\\prime\}is directly concatenated \(⊕\\oplus\) with the deep spatiotemporal feature vectorFdeepF\_\{\\text\{deep\}\}:

\(7\)Ffinal=Fdeep⊕S′F\_\{\\text\{final\}\}=F\_\{\\text\{deep\}\}\\oplus S^\{\\prime\}This deterministic fusion endows the classifier with dual\-awareness, simultaneously perceiving morphological representations and statistical deviations\. By treating the scalar prior as an influential trigger alongside high\-dimensional features, it sharply refines the decision boundary, effectively classifying challenging samples where ambiguous morphology is clarified by pathological scores\.

Table 1\.Performance comparison with State\-of\-the\-Art \(SOTA\) methods on Mumtaz2016 \(10\-fold cross\-validation\)\. Note: All baselines were re\-implemented and evaluated under our unified zero\-shot pipeline, resulting in slight numerical deviations from original reports\.

### 3\.5\.Training Strategy Optimization

To counteract the severe overfitting and convergence instability endemic to small\-sample EEG regimes, we devise a robust optimization protocol\.

#### 3\.5\.1\.Optimizer and Regularization

We institute a dual\-pronged regularization scheme\. First, we discard standard step\-wise decay in favor of a Cosine Annealing learning rate scheduler\(Loshchilov and Hutter,[2016](https://arxiv.org/html/2606.00180#bib.bib57)\)\. By periodically injecting energy into the optimization trajectory, this approach effectively propels the model out of sharp, spurious local minima, guiding the weights toward flatter regions of the loss landscape that inherently possess superior generalization bounds\.

Concurrently, to penalize the classifier’s tendency toward over\-confidence on limited training manifolds, we introduce Label Smoothing\(Szegedyet al\.,[2016](https://arxiv.org/html/2606.00180#bib.bib58); Mülleret al\.,[2019](https://arxiv.org/html/2606.00180#bib.bib59)\)\. This technique relaxes the rigid one\-hot ground\-truth labelsyiy\_\{i\}into continuous soft targetsy~i\\tilde\{y\}\_\{i\}:

\(8\)y~i=\(1−δ\)⋅yi\+δK,\\tilde\{y\}\_\{i\}=\(1\-\\delta\)\\cdot y\_\{i\}\+\\frac\{\\delta\}\{K\},where the smoothing marginδ\\deltais strictly set to0\.10\.1for our binary classification setting \(K=2K=2\)\.

#### 3\.5\.2\.Supervised Classification Loss

The end\-to\-end supervised stream is optimized via a smoothed binary cross\-entropy objective\. For a mini\-batch ofNNpredictionsy^i\\hat\{y\}\_\{i\}and their corresponding softened targetsy~i\\tilde\{y\}\_\{i\}\(Eq\.[8](https://arxiv.org/html/2606.00180#S3.E8)\), the classification lossℒcls\\mathcal\{L\}\_\{\\text\{cls\}\}is formalized as:

\(9\)ℒcls=−1N∑i=1N\[y~ilog⁡\(y^i\)\+\(1−y~i\)log⁡\(1−y^i\)\]\.\\mathcal\{L\}\_\{\\text\{cls\}\}=\-\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\left\[\\tilde\{y\}\_\{i\}\\log\(\\hat\{y\}\_\{i\}\)\+\(1\-\\tilde\{y\}\_\{i\}\)\\log\(1\-\\hat\{y\}\_\{i\}\)\\right\]\.Synergizing this objective with the aforementioned regularization strategies strictly bounds the hypothesis space, ensuring stable convergence while effectively neutralizing the risk of memorizing dataset\-specific noise\.

## 4\.Experiments and Results

### 4\.1\.Experimental Setup

#### 4\.1\.1\.Datasets and Preprocessing

Mumtaz2016\(Mumtazet al\.,[2017](https://arxiv.org/html/2606.00180#bib.bib60)\):This benchmark comprises resting\-state, 19\-electrode EEG recordings from 34 MDD patients and 30 HCs\. Preprocessing includes a 0\.5\-50\.0 Hz band\-pass filter, 256 Hz resampling, and 5\-second sliding window segmentation \(50% overlap\), yielding 13,953 segments\. Under our subject\-independent 10\-fold cross\-validation scheme, each fold partitions the data into∼\\sim12,557 training and∼\\sim1,396 testing segments\. To ensure zero data leakage, the unsupervised stream is independently retrained for each fold using only the healthy segments within the current training set \(∼\\sim6,038 HC segments\), while the supervised stream utilizes the entire∼\\sim12,557\-segment training set\.

MODMA\(Caiet al\.,[2022](https://arxiv.org/html/2606.00180#bib.bib61)\):Serving as a rigorous cross\-domain testbed, this dataset contains 128\-electrode high\-density EEG from 24 MDD and 29 HCs\. Following artifact removal, we apply our Spatial Topology Mapping to geometrically project the 128 electrodes onto the standard 19\-electrode layout\. Identical segmentation generates 6,355 samples, which are strictly reserved as an unseen zero\-shot test set for final evaluation\.

#### 4\.1\.2\.Implementation Details

Models are implemented in PyTorch on an NVIDIA RTX 6000 GPU, optimized via Adam and a Cosine Annealing scheduler\. The unsupervised stream configures the VQ\-VAE codebook \(K=512K=512, dimension6464\) and DDPM \(T=500T=500, linear noiseβ∈\[10−4,0\.02\]\\beta\\in\[10^\{\-4\},0\.02\]\)\. The supervised stream uses an initial learning rate of5×10−45\\times 10^\{\-4\}, weight decay of1×10−41\\times 10^\{\-4\}, and0\.50\.5dropout\. Label smoothing \(δ=0\.1\\delta=0\.1\) and early stopping \(patience 15\) are employed to mitigate overfitting\.

Segment\-level metricsthat evaluate individual EEG windows include Accuracy \(Acc\), Precision, F1\-Score, AUC, and Recall \(Rec\)\.Subject\-level Accuracy \(Sub\-Acc\)determines the final patient\-level diagnosis via majority voting across an individual’s segments\.

### 4\.2\.Comparison with State\-of\-the\-Art

To rigorously evaluate SGC on the Mumtaz2016 dataset, we benchmark against 14 established methods spanning four paradigms \(detailed citations are provided in Table[1](https://arxiv.org/html/2606.00180#S3.T1)\)\. These include traditional machine learning methods such as LR, SVM, and XGBoost; deep discriminative models like EEGNet, InceptionNet, and DeprNet; spatio\-temporal architectures including 1D\-CNN\-LSTM, 1D\-CNN\-Transformer, CWT\-CNNs, TSception, GRU\-Conv, and GC\-GRU; as well as generative augmentation approaches represented by DiffMDD\. For fairness, all deep learning baselines are re\-implemented and evaluated under our identical 10\-fold cross\-validation and zero\-shot pipelines\.

As detailed in Table[1](https://arxiv.org/html/2606.00180#S3.T1), SGC consistently dominates across all metrics\. Compared to traditional 1D\-CNNs \(DeprNet: 89\.84%\) and spatio\-temporal networks \(GC\-GRU: 90\.15%\), SGC achieves 95\.19% accuracy without generating a single new sample\. Crucially, it outperforms the SOTA generative augmentation model DiffMDD \(93\.85%\)\. This compellingly demonstrates that explicitly extracting high\-quality pathological priors is fundamentally more effective than incurring the computational costs and noise risks of pseudo\-sample generation\.

### 4\.3\.Cross\-Dataset Generalization: Robustness to Electrode Heterogeneity

To rigorously evaluate SGC’s zero\-shot clinical transferability, we conduct external validation on the MODMA dataset without any domain\-specific fine\-tuning\. This presents a severedomain shiftdue to extreme hardware disparities \(128\-electrode to 19\-electrode interpolation\) and distinct patient demographics\. The results are detailed in Table[2](https://arxiv.org/html/2606.00180#S4.T2)\.

Table 2\.Cross\-dataset zero\-shot generalization \(MODMA\)\. Traditional SOTA models and generative augmentation suffer severe domain shift, while SGC remains robust\.1\) Cross\-Domain Brittleness of Existing Paradigms\.Purely data\-driven methods degrade catastrophically in zero\-shot transfer\. Notably, GC\-GRU plummets to 55\.40% accuracy, as cross\-hardware spatial interpolation disrupts source\-domain functional topologies\. Furthermore, even the state\-of\-the\-art generative augmentation model, DiffMDD \(68\.34%\), exhibits significant brittleness\. This proves that synthesized pseudo\-samples fail to bridge hardware gaps and inadvertently introduce domain\-specific artifacts, whereas our SGC maintains a robust 78\.50% accuracy\.

2\) Domain\-Agnostic Robustness of Priors\.Conversely, SGC achieves robust 78\.50% \(segment\) and 79\.94% \(subject\) accuracy under extreme domain shifts\. Crucially, while baselines suffer catastrophic clinical leakage \(Recall≤61%\\leq 61\\%\), SGC maintains an exceptional 75\.34% Recall\. This confirms that domain\-agnostic anomaly priors immunize classifiers against hardware and demographic variations, capturing invariant pathological manifolds missed by existing paradigms\.

3\) Limitations and Future Work\.Despite outperforming all zero\-shot baselines, SGC’s intra\- vs\. cross\-dataset performance gap \(95\.19% vs\. 78\.50%\) indicates that hard electrode mapping loses fine\-grained spatial information\. Future work will integrate Unsupervised Domain Adaptation \(UDA\) to seamlessly align high\-dimensional manifolds across heterogeneous hardware\.

### 4\.4\.Effectiveness Beyond Augmentation: Mining Priors over Generation

To empirically validate our “mining over generation” philosophy, we define several internal variants: Supervised Baseline \(Baseline\), utilizing our pure 1D\-CNN\-Transformer backbone without score\-guided fusion; Diffusion Augmentation \(Diff\-Aug\), training the baseline on datasets expanded via DDPM synthesis; and Advanced Generative Heuristics, comprising Artifact Cleaning \(Clean\) to filter high\-frequency synthesis artifacts, Hard Negative Mining \(Hard Neg\) to generate challenging boundary cases, and Spatial Hybridization \(Hybrid\) to dynamically blend spatial channels across multiple samples to enrich diversity\. These exhaustive comparisons prove that complex sample synthesis fundamentally falls short of our zero\-augmentation SGC\.

Table 3\.Comparison of generative augmentation vs\. explicit prior mining across datasets\.1\) Performance Leap without Synthetic Noise\.As shown in Table[3](https://arxiv.org/html/2606.00180#S4.T3), while Diff\-Aug improves upon the Baseline, it remains fundamentally inferior to SGC\. Notably, SGC achieves a remarkable 95\.19% on Mumtaz2016 and maintains a robust 78\.50% in zero\-shot transfer on MODMA, significantly outperforming Diff\-Aug’s 93\.49% and 66\.68%\. Furthermore, despite exploring advanced generative heuristics, including hard negative mining \(Hard Neg\), hybrid spatial augmentations \(Hybrid\), and artifact\-filtered synthesis \(Clean\)\. These variants yield only marginal intra\-dataset improvements over the Baseline, failing to surpass standard Diff\-Aug\. More critically, none could overcome the catastrophic degradation in cross\-dataset zero\-shot transfer\. This reinforces that explicitly extracting anomaly scores effectively bypasses the synthetic noise inherently introduced by generative data expansion\.

2\) Reshaping the Feature Space\.Beyond numerical gains, the extracted prior serves as a potent feature calibrator\. In the t\-SNE projections \(Fig\.[4](https://arxiv.org/html/2606.00180#S4.F4)\), the Baseline model \(Fig\.[4](https://arxiv.org/html/2606.00180#S4.F4)a\) struggles with inherent EEG noise, yielding entangled boundaries\. By fusing the pathological prior, SGC \(Fig\.[4](https://arxiv.org/html/2606.00180#S4.F4)b\) forces the high\-dimensional manifold to collapse into highly cohesive intra\-class clusters while maximizing the inter\-class margin\.

![Refer to caption](https://arxiv.org/html/2606.00180v1/x4.png)\(a\)Baseline model
![Refer to caption](https://arxiv.org/html/2606.00180v1/x5.png)\(b\)SGC

Figure 4\.t\-SNE visualization\. SGC \(b\) achieves explicit feature calibration, characterized by tightened intra\-class cohesion compared to the entangled Baseline \(a\)\.3\) Clinical Utility via Maximized Recall\.In psychiatric screening, misclassifying an MDD patient as healthy incurs severe clinical repercussions\. Fig\.[5](https://arxiv.org/html/2606.00180#S4.F5)contrasts the confusion matrices from a representative testing fold\. While Diff\-Aug \(b\) fails to resolve ambiguous boundary cases \(missing 115 MDD patients\), SGC \(c\) achieves a perfect clinical safety net with zero missed diagnoses\. This underscores the profound translational value of SGC as a robust diagnostic fail\-safe\.

![Refer to caption](https://arxiv.org/html/2606.00180v1/x6.png)\(a\)Baseline
![Refer to caption](https://arxiv.org/html/2606.00180v1/x7.png)\(b\)Diff\-Aug
![Refer to caption](https://arxiv.org/html/2606.00180v1/x8.png)\(c\)SGC

Figure 5\.Confusion matrices on a representative fold\. SGC \(c\) completely eliminates the clinical leakage \(False Negatives\) suffered by generative augmentation \(b\)\.
### 4\.5\.Ablation Study: Complementarity of Generative Priors

To validate the contributions of SGC’s core modules, we evaluate three ablation variants across source \(Mumtaz2016\) and zero\-shot \(MODMA\) domains\. These include: a variant without distributional modeling \(referred to as w/o Dist\.\), relying solely on VQ\-VAE structural error; a variant without structural reconstruction \(referred to as w/o Struct\.\), depending entirely on DDPM distribution error; and a decision fusion variant \(referred to as Dec\. Fusion\), replacing early feature concatenation with terminal logit summation\. Results are presented in Table[4](https://arxiv.org/html/2606.00180#S4.T4)\.

Table 4\.Ablation study of SGC core components across datasets\.1\) Dual\-Stream Complementarity\.Ablating either stream yields suboptimal accuracy on Mumtaz2016, whereas their integration achieves 95\.19% \(Table[4](https://arxiv.org/html/2606.00180#S4.T4)\)\. This synergy becomes critical in MODMA zero\-shot transfer: individual models struggle under severe domain shifts \(65\.32% and 63\.40%\), yet fusion reaches 78\.50%\. Notably, while distribution modeling dominates intra\-dataset performance, structural quantization exhibits superior resilience against cross\-domain hardware variations\. This confirms that explicit morphological and implicit distributional priors capture distinct, complementary facets of pathological EEG variance\.

2\) Necessity of Feature\-Level Fusion\.Replacing feature concatenation with late Decision Fusion drops accuracy to 93\.12% on Mumtaz2016 and triggers a catastrophic degradation to 69\.69% on zero\-shot MODMA \(an 8\.81% drop\)\. This performance collapse proves that injecting anomaly scores as high\-order features enables the global attention mechanism to dynamically calibrate spatiotemporal representations against pathological priors, optimizing non\-linear decision boundaries far more effectively than shallow terminal logit fusion\.

## 5\.Conclusion

Existing deep learning methods for EEG\-based depression detection rely heavily on generative data augmentation to overcome data scarcity\. However, this prevailing “data\-quantity\-first” convention inevitably introduces synthetic noise and blurs classification boundaries, compromising clinical reliability\. To address this, we propose Score\-Guided Classification \(SGC\), a novel framework that shifts the paradigm from sample generation to unsupervised prior mining\. In SGC, we synergize structural anomaly metrics from a VQ\-VAE and probabilistic distributional shifts from a DDPM, which constructs a potent pathological prior\. This prior explicitly guides the classifier’s decision boundary entirely within a zero\-augmentation setting\. Experiments demonstrate that SGC achieves state\-of\-the\-art performance on the Mumtaz2016 benchmark and exhibits exceptional domain\-agnostic robustness in severe zero\-shot cross\-dataset evaluations\. By prioritizing feature integrity over synthetic quantity, SGC establishes a reliable diagnostic pathway for complex medical time\-series\. Future work will integrate Unsupervised Domain Adaptation \(UDA\) to further bridge multi\-center hardware heterogeneities\.

## References

- J\. An and S\. Cho \(2015\)Variational autoencoder based anomaly detection using reconstruction probability\.Special lecture on IE2\(1\),pp\. 1–18\.Cited by:[§2\.3](https://arxiv.org/html/2606.00180#S2.SS3.p1.1)\.
- B\. Ay, O\. Yildirim, M\. Talo, U\. B\. Baloglu, G\. Aydin, S\. D\. Puthankattil, and U\. R\. Acharya \(2019\)Automated depression detection using deep representation and sequence learning with eeg signals\.Journal of medical systems43\(7\),pp\. 205\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p2.1),[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.5.4.1),[Table 2](https://arxiv.org/html/2606.00180#S4.T2.4.1.2.1.1)\.
- C\. Baur, B\. Wiestler, S\. Albarqouni, and N\. Navab \(2018\)Deep autoencoding models for unsupervised anomaly segmentation in brain mr images\.InInternational MICCAI brainlesion workshop,pp\. 161–169\.Cited by:[§2\.3](https://arxiv.org/html/2606.00180#S2.SS3.p1.1)\.
- H\. Cai, Z\. Yuan, Y\. Gao, S\. Sun, N\. Li, F\. Tian, H\. Xiao, J\. Li, Z\. Yang, X\. Li,et al\.\(2022\)A multi\-modal open dataset for mental\-disorder analysis\.Scientific data9\(1\),pp\. 178\.Cited by:[§4\.1\.1](https://arxiv.org/html/2606.00180#S4.SS1.SSS1.p2.1.1)\.
- Y\. Chen and C\. Yang \(2025\)STGE\-former: spatial\-temporal graph\-enhanced transformer for eeg\-based major depressive disorder detection\.InICASSP 2025\-2025 IEEE International Conference on Acoustics, Speech and Signal Processing \(ICASSP\),pp\. 1–5\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p2.1)\.
- D\. Choi, C\. Yip, A\. Choi, and J\. Park \(2026\)Fail closed trust gated synthetic augmentation governs tail risk under subject shift in eeg\.bioRxiv,pp\. 2026–01\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p4.1)\.
- K\. Choudhury, S\. Roy, A\. Chanda, S\. Biswas, and S\. Kuiry \(2025\)Improving predictive confidence in medical imaging via online label smoothing\.InBIO Web of Conferences,Vol\.204,pp\. 01019\.Cited by:[§3\.4\.1](https://arxiv.org/html/2606.00180#S3.SS4.SSS1.p3.1)\.
- Y\. Ding, N\. Robinson, Q\. Zeng, D\. Chen, A\. A\. P\. Wai, T\. Lee, and C\. Guan \(2020\)TSception: a deep learning framework for emotion detection using eeg\.In2020 international joint conference on neural networks \(IJCNN\),pp\. 1–7\.Cited by:[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.11.10.1),[Table 2](https://arxiv.org/html/2606.00180#S4.T2.4.1.5.4.1)\.
- M\. El Kerdawy, M\. El Halaby, A\. Hassan, M\. Maher, H\. Fayed, D\. Shawky, and A\. Badawi \(2020\)The automatic detection of cognition using eeg and facial expressions\.Sensors20\(12\),pp\. 3516\.Cited by:[§3\.4\.1](https://arxiv.org/html/2606.00180#S3.SS4.SSS1.p2.2)\.
- T\. Fernando, H\. Gammulle, S\. Denman, S\. Sridharan, and C\. Fookes \(2021\)Deep learning for medical anomaly detection–a survey\.ACM computing surveys \(CSUR\)54\(7\),pp\. 1–37\.Cited by:[§2\.3](https://arxiv.org/html/2606.00180#S2.SS3.p1.1)\.
- I\. Goodfellow, J\. Pouget\-Abadie, M\. Mirza, B\. Xu, D\. Warde\-Farley, S\. Ozair, A\. Courville, and Y\. Bengio \(2014\)Generative adversarial nets\.InAdvances in neural information processing systems,Vol\.27\.Cited by:[§3\.3](https://arxiv.org/html/2606.00180#S3.SS3.p1.1)\.
- L\. Hallal, J\. Rhinelander, R\. Venkat, and A\. Newman \(2026\)Efficient feature extraction for eeg\-based classification: a comparative review of deep learning models\.AI7\(2\),pp\. 50\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p4.1)\.
- A\. M\. Hamed, A\. A\. Heliel, and H\. El\-Behery \(2026\)Explainable eeg analysis of major psychiatric disorders: power spectra, functional connectivity, and shap interpretation\.Journal of Contemporary Technology and Applied Engineering5\(1\),pp\. 32–48\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p3.1)\.
- K\. G\. Hartmann, R\. T\. Schirrmeister, and T\. Ball \(2018\)EEG\-gan: generative adversarial networks for electroencephalograhic \(eeg\) brain signals\.arXiv preprint arXiv:1806\.01875\.Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- M\. Hassan and N\. Kaabouch \(2024\)Impact of feature selection techniques on the performance of machine learning models for depression detection using eeg data\.Applied Sciences14\(22\),pp\. 10532\.Cited by:[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.2.1.1),[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.3.2.1),[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.4.3.1),[Table 2](https://arxiv.org/html/2606.00180#S4.T2.4.1.4.3.1)\.
- C\. He, J\. Liu, Y\. Zhu, and W\. Du \(2021\)Data augmentation for deep neural networks model in eeg classification task: a review\.Frontiers in Human Neuroscience15,pp\. 765525\.Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- J\. Ho, A\. Jain, and P\. Abbeel \(2020\)Denoising diffusion probabilistic models\.Advances in neural information processing systems33,pp\. 6840–6851\.Cited by:[§2\.3](https://arxiv.org/html/2606.00180#S2.SS3.p1.1),[§3\.3\.2](https://arxiv.org/html/2606.00180#S3.SS3.SSS2.p1.3),[§3\.3\.2](https://arxiv.org/html/2606.00180#S3.SS3.SSS2.p2.3),[§3\.3](https://arxiv.org/html/2606.00180#S3.SS3.p2.1)\.
- P\. Hou, X\. Li, J\. Zhu, and B\. Hu \(2025\)A lightweight convolutional transformer neural network for eeg\-based depression recognition\.Biomedical Signal Processing and Control100,pp\. 107112\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- S\. Ioffe and C\. Szegedy \(2015\)Batch normalization: accelerating deep network training by reducing internal covariate shift\.InInternational conference on machine learning,pp\. 448–456\.Cited by:[§3\.4\.2](https://arxiv.org/html/2606.00180#S3.SS4.SSS2.p1.1)\.
- J\. Kim, H\. Nam, D\. Won, and C\. Im \(2025\)Domain\-generalized deep learning for improved subject\-independent emotion recognition based on electroencephalography\.Experimental Neurobiology34\(3\),pp\. 119\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- D\. P\. Kingma and M\. Welling \(2013\)Auto\-encoding variational bayes\.arXiv preprint arXiv:1312\.6114\.Cited by:[§3\.3](https://arxiv.org/html/2606.00180#S3.SS3.p1.1)\.
- G\. H\. Klem \(1999\)The ten\-twenty electrode system of the international federation\. the international federation of clinical neurophysiology\.Electroencephalogr\. Clin\. Neurophysiol\. Suppl\.52,pp\. 3–6\.Cited by:[§3\.2](https://arxiv.org/html/2606.00180#S3.SS2.p2.4)\.
- E\. Lashgari, D\. Liang, and U\. Maoz \(2020\)Data augmentation for deep\-learning\-based electroencephalography\.Journal of Neuroscience Methods346,pp\. 108885\.Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- M\. A\. Lawal, A\. Abayomi, A\. A\. Salihu, A\. A\. Aliyu, M\. A\. Aliyu, and I\. Z\. Yakubu \(2026\)Transfer learning and domain adaptation in neuroinformatics\.InDeep Learning Applications in Neuroinformatics,pp\. 287–310\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p3.1)\.
- V\. J\. Lawhern, A\. J\. Solon, N\. R\. Waytowich, S\. M\. Gordon, C\. P\. Hung, and B\. J\. Lance \(2018\)EEGNet: a compact convolutional neural network for eeg\-based brain–computer interfaces\.Journal of neural engineering15\(5\),pp\. 056013\.Cited by:[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.9.8.1)\.
- I\. Leccisotti, A\. Mollica, R\. Laurello, M\. C\. Moretti, M\. Altamura, A\. Bellomo, F\. Panza, and M\. Lozupone \(2026\)Machine learning\-assisted resting\-state electroencephalography improves diagnostic accuracy in psychiatric disorders: a narrative review\.Advanced Technology in Neuroscience3\(1\),pp\. 21–33\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p1.1)\.
- C\. Lee, H\. Lee, and D\. Kim \(2026\)RL\-bioaug: label\-efficient reinforcement learning for self\-supervised eeg representation learning\.arXiv preprint arXiv:2601\.13964\.Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- W\. Li, S\. Wang, S\. Shao, and K\. Huang \(2024\)Distillation\-based domain generalization for cross\-dataset eeg\-based emotion recognition\.IEEE Transactions on Emerging Topics in Computational Intelligence9\(3\),pp\. 2474–2490\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- C\. Liao, S\. Zhao, X\. Wang, J\. Zhang, Y\. Liao, and X\. Wu \(2025\)EEG data augmentation method based on the gaussian mixture model\.Mathematics13\(5\),pp\. 729\.Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- L\. Lin, Z\. Li, R\. Li, X\. Li, and J\. Gao \(2024\)Diffusion models for time\-series applications: a survey\.Frontiers of Information Technology & Electronic Engineering25\(1\),pp\. 19–41\.Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- Y\. Liu, C\. Pu, S\. Xia, D\. Deng, X\. Wang, and M\. Li \(2022\)Machine learning approaches for diagnosing depression using eeg: a review\.Translational Neuroscience13\(1\),pp\. 224–235\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- I\. Loshchilov and F\. Hutter \(2016\)Sgdr: stochastic gradient descent with warm restarts\.arXiv preprint arXiv:1608\.03983\.Cited by:[§3\.5\.1](https://arxiv.org/html/2606.00180#S3.SS5.SSS1.p1.1)\.
- H\. Lu, Z\. You, Y\. Guo, and X\. Hu \(2024\)Mast\-gcn: multi\-scale adaptive spatial\-temporal graph convolutional network for eeg\-based depression recognition\.IEEE Transactions on Affective Computing15\(4\),pp\. 1985–1996\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- G\. Luo, H\. Rao, P\. An, Y\. Li, R\. Hong, W\. Chen, and S\. Chen \(2023\)Exploring adaptive graph topologies and temporal graph networks for eeg\-based depression detection\.IEEE Transactions on Neural Systems and Rehabilitation Engineering31,pp\. 3947–3957\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p2.1),[§1](https://arxiv.org/html/2606.00180#S1.p4.1),[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.14.13.1),[Table 2](https://arxiv.org/html/2606.00180#S4.T2.4.1.7.6.1)\.
- R\. Lyu \(2026\)Deep learning approaches for eeg\-based healthcare applications: a comprehensive review\.Frontiers in Human Neuroscience19,pp\. 1689073\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p3.1)\.
- A\. F\. Marquand, I\. Rezek, J\. Buitelaar, and C\. F\. Beckmann \(2016\)Understanding heterogeneity in clinical cohorts using normative models: beyond case\-control studies\.Biological psychiatry80\(7\),pp\. 552–561\.Cited by:[§3\.3](https://arxiv.org/html/2606.00180#S3.SS3.p1.1)\.
- P\. Mirowski and A\. Fabijańska \(2026\)Diffusion model\-based synthesis of brain images for data augmentation\.Biomedical Signal Processing and Control113,pp\. 108940\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p4.1)\.
- R\. Müller, S\. Kornblith, and G\. E\. Hinton \(2019\)When does label smoothing help?\.Advances in neural information processing systems32\.Cited by:[§3\.5\.1](https://arxiv.org/html/2606.00180#S3.SS5.SSS1.p2.2)\.
- W\. Mumtaz, L\. Xia, S\. S\. A\. Ali, M\. A\. M\. Yasin, M\. Hussain, and A\. S\. Malik \(2017\)Electroencephalogram \(eeg\)\-based computer\-aided technique to diagnose major depressive disorder \(mdd\)\.Biomedical Signal Processing and Control31,pp\. 108–115\.Cited by:[§4\.1\.1](https://arxiv.org/html/2606.00180#S4.SS1.SSS1.p1.4.1)\.
- S\. Olbrich, N\. Jaworska, S\. de la Salle, V\. Knott, P\. Blier, M\. Brunovsky, T\. Welt, M\. de Bardeci, and C\. Teng\-Ip \(2026\)Deep learning using electroencephalogram \(eeg\) data for diagnosing and predicting ssri response in major depressive disorder\.Communications Medicine6\(1\),pp\. 159\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- D\. Peng, W\. Zheng, and B\. Lu \(2025\)Enhancing depression detection from emotion eeg with temporal\-spatial\-spectral representation learning\.In2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society \(EMBC\),pp\. 1–5\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1),[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.6.5.1)\.
- F\. Perrin, J\. Pernier, O\. Bertrand, and J\. F\. Echallier \(1989\)Spherical splines for scalp potential and current density mapping\.Electroencephalography and clinical neurophysiology72\(2\),pp\. 184–187\.Cited by:[§3\.2](https://arxiv.org/html/2606.00180#S3.SS2.p2.4)\.
- W\. H\. Pinaya, M\. S\. Graham, R\. Gray, P\. F\. Da Costa, P\. Tudosiu, P\. Wright, Y\. H\. Mah, A\. D\. MacKinnon, J\. T\. Teo, R\. Jager,et al\.\(2022\)Fast unsupervised brain anomaly detection and segmentation with diffusion models\.InInternational Conference on Medical Image Computing and Computer\-Assisted Intervention,pp\. 705–714\.Cited by:[§3\.3](https://arxiv.org/html/2606.00180#S3.SS3.p2.1)\.
- U\. Raghavendra, A\. Gudigar, Y\. Chakole, P\. Kasula, D\. Subha, N\. A\. Kadri, E\. J\. Ciaccio, and U\. R\. Acharya \(2023\)Automated detection and screening of depression using continuous wavelet transform with electroencephalogram signals\.Expert Systems40\(4\),pp\. e12803\.Cited by:[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.7.6.1),[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.8.7.1),[Table 2](https://arxiv.org/html/2606.00180#S4.T2.4.1.3.2.1)\.
- C\. Rommel, J\. Paillard, T\. Moreau, and A\. Gramfort \(2022\)Data augmentation for learning predictive models on eeg: a systematic comparison\.Journal of Neural Engineering19\(6\),pp\. 066020\.Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- S\. Rutherford, S\. M\. Kia, T\. Wolfers, C\. Fraza, M\. Zabihi, R\. Dinga, P\. Berthet, A\. Worker, S\. Verdi, H\. G\. Ruhe,et al\.\(2022\)The normative modeling framework for computational psychiatry\.Nature protocols17\(7\),pp\. 1711–1734\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p5.1)\.
- A\. Seal, R\. Bajpai, J\. Agnihotri, A\. Yazidi, E\. Herrera\-Viedma, and O\. Krejcar \(2021\)DeprNet: a deep convolution neural network framework for detecting depression using eeg\.IEEE Transactions on Instrumentation and Measurement70,pp\. 1–13\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p2.1),[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.13.12.1),[Table 2](https://arxiv.org/html/2606.00180#S4.T2.4.1.6.5.1)\.
- G\. Sharma, A\. M\. Joshi, R\. Gupta, and L\. R\. Cenkeramaddi \(2023\)DepCap: a smart healthcare framework for eeg based depression detection using time\-frequency response and deep neural network\.IEEE Access11,pp\. 52327–52338\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p2.1)\.
- J\. Shen, K\. Wang, Z\. Zhao, Y\. Zhang, F\. Tian, X\. Zhang, Q\. Dong, and B\. Hu \(2025\)WDANet: wasserstein distribution inspired dynamic adversarial network for eeg\-based cross\-domain depression recognition\.IEEE Transactions on Affective Computing\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- A\. Singh, V\. Tiwari, H\. Patel, G\. Vivekananda, D\. S\. Rajput,et al\.\(2024\)Slitranet: an eeg\-based automated diagnosis framework for major depressive disorder monitoring using a novel lgcn and transformer\-based hybrid deep learning approach\.IEEE Access12,pp\. 173109–173126\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- Y\. Sun, J\. Sun, Z\. Zhou, J\. Cai, W\. Cui, and G\. Liu \(2026\)A novel spatial\-temporal graph neural network for major depressive disorder detection based on resting state eeg signals\.IEEE Sensors Journal26\(6\),pp\. 8660–8671\.External Links:[Document](https://dx.doi.org/10.1109/JSEN.2026.3657750)Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- C\. Szegedy, W\. Liu, Y\. Jia, P\. Sermanet, S\. Reed, D\. Anguelov, D\. Erhan, V\. Vanhoucke, and A\. Rabinovich \(2015\)Going deeper with convolutions\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 1–9\.Cited by:[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.10.9.1)\.
- C\. Szegedy, V\. Vanhoucke, S\. Ioffe, J\. Shlens, and Z\. Wojna \(2016\)Rethinking the inception architecture for computer vision\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 2818–2826\.Cited by:[§3\.5\.1](https://arxiv.org/html/2606.00180#S3.SS5.SSS1.p2.2)\.
- D\. Truong and A\. Delorme \(2025\)Data normalization strategies for eeg deep learning\.arXiv preprint arXiv:2506\.22455\.Cited by:[§3\.4\.1](https://arxiv.org/html/2606.00180#S3.SS4.SSS1.p2.2)\.
- A\. Van Den Oord, O\. Vinyals,et al\.\(2017\)Neural discrete representation learning\.Advances in neural information processing systems30\.Cited by:[§2\.3](https://arxiv.org/html/2606.00180#S2.SS3.p1.1),[§3\.3\.1](https://arxiv.org/html/2606.00180#S3.SS3.SSS1.p1.9),[§3\.3\.1](https://arxiv.org/html/2606.00180#S3.SS3.SSS1.p2.7),[§3\.3](https://arxiv.org/html/2606.00180#S3.SS3.p2.1)\.
- S\. N\. Vaniya, A\. Habib, M\. Angelova, and C\. Karmakar \(2026\)Simplifying depression diagnosis: single\-channel eeg and deep learning approaches\.IEEE Journal of Biomedical and Health Informatics\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.Advances in neural information processing systems30\.Cited by:[§3\.4\.2](https://arxiv.org/html/2606.00180#S3.SS4.SSS2.p2.1)\.
- H\. Wang, J\. Yin, S\. Gao, J\. Liu, and Q\. Wu \(2024a\)Major depressive disorder detection using graph domain adaptation with global message\-passing based on eeg signals\.IEEE Transactions on Affective Computing16\(3\),pp\. 1500–1513\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- S\. Wang, Y\. Guo, Y\. Dong, Y\. Shen, Z\. Zhang, A\. C\. Cheung, J\. Pu, S\. Zhong, R\. K\. Tong, Y\. Li, M\. K\. Ng, K\. Tsang, and G\. Ren \(2026\)Generative ai empowers brain\-computer interfaces: a review\-perspective on technical realities and future visions\.IEEE Transactions on Consumer Electronics72\(1\),pp\. 11–20\.External Links:[Document](https://dx.doi.org/10.1109/TCE.2025.3650654)Cited by:[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- Y\. Wang, S\. Zhao, H\. Jiang, S\. Li, B\. Luo, T\. Li, and G\. Pan \(2024b\)Diffmdd: a diffusion\-based deep learning framework for mdd diagnosis using eeg\.IEEE Transactions on Neural Systems and Rehabilitation Engineering32,pp\. 728–738\.Cited by:[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.15.14.1),[Table 2](https://arxiv.org/html/2606.00180#S4.T2.4.1.8.7.1)\.
- Y\. Wang, Y\. Peng, M\. Han, X\. Liu, H\. Niu, J\. Cheng, S\. Chang, and T\. Liu \(2024c\)GCTNet: a graph convolutional transformer network for major depressive disorder detection based on eeg signals\.Journal of Neural Engineering21\(3\),pp\. 036042\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
- X\. Wong, Z\. Zhao, H\. Guo, Z\. Liu, Y\. Wu, F\. Yan, Z\. Wang, and S\. Song \(2026\)CRCC: contrast\-based robust cross\-subject and cross\-site representation learning for eeg\.arXiv preprint arXiv:2602\.19138\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p3.1)\.
- C\. Xu, L\. Zhu, J\. Lai, Z\. Luo, J\. Ying, S\. Hu, P\. Song, and J\. Yang \(2026\)Global and regional quality of care index in major depressive disorder: the global burden of disease study 2021\.International Journal for Equity in Health\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p1.1)\.
- G\. Xu, W\. Guo, and Y\. Wang \(2023\)Subject\-independent eeg emotion recognition with hybrid spatio\-temporal gru\-conv architecture\.Medical & Biological Engineering & Computing61\(1\),pp\. 61–73\.Cited by:[Table 1](https://arxiv.org/html/2606.00180#S3.T1.4.12.11.1)\.
- W\. Yang, W\. Yan, W\. Liu, Y\. Ma, and Y\. Li \(2025\)THD\-bar: topology hierarchical derived brain autoregressive modeling for eeg generic representations\.arXiv preprint arXiv:2511\.13733\.Cited by:[§3\.3](https://arxiv.org/html/2606.00180#S3.SS3.p2.1)\.
- J\. Zhang, Y\. Zheng, and Y\. Shi \(2023\)A soft label method for medical image segmentation with multirater annotations\.Computational Intelligence and Neuroscience2023\(1\),pp\. 1883597\.Cited by:[§3\.4\.1](https://arxiv.org/html/2606.00180#S3.SS4.SSS1.p3.1)\.
- J\. Zhang, Z\. Liu, M\. Cheng, X\. Wang, Z\. Liu, and Q\. Liu \(2026\)StaTS: spectral trajectory schedule learning for adaptive time series forecasting with frequency guided denoiser\.arXiv preprint arXiv:2603\.00037\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p4.1)\.
- Y\. Zhao, Y\. Liu, W\. Zheng, and B\. Lu \(2024\)EEG data augmentation for emotion recognition using diffusion model\.In2024 46th annual international conference of the ieee engineering in medicine and biology society \(embc\),pp\. 1–4\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- T\. Zhou, X\. Chen, Y\. Shen, M\. Nieuwoudt, C\. Pun, and S\. Wang \(2023\)Generative ai enables eeg data augmentation for alzheimer’s disease detection via diffusion model\.In2023 IEEE International Symposium on Product Compliance Engineering\-Asia \(ISPCE\-ASIA\),pp\. 1–6\.Cited by:[§1](https://arxiv.org/html/2606.00180#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.00180#S2.SS2.p1.1)\.
- R\. Zhulduzbayev, A\. Ashourvan, D\. Arman, A\. Bissembayev, and A\. Kustubayeva \(2026\)A structured review of eeg\-based machine learning approaches for brain age prediction\.Algorithms19\(1\),pp\. 91\.Cited by:[§2\.1](https://arxiv.org/html/2606.00180#S2.SS1.p1.1)\.
Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection

Similar Articles

A Granularity-Aware EEG Feature Framework for Psychopathology Dimension Prediction

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

Graph-Regularized Deep Learning for EEG-Based Emotion Recognition with Psychologically-Grounded Label Structure

Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis

Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

Submit Feedback

Similar Articles

A Granularity-Aware EEG Feature Framework for Psychopathology Dimension Prediction
Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection
Graph-Regularized Deep Learning for EEG-Based Emotion Recognition with Psychologically-Grounded Label Structure
Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis
Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation