I\textsuperscript{2}RiMA: Spectral Riemannian Representation with Temporal Attention for Mental Stress Detection based on EEG Signals

arXiv cs.LG 07/03/26, 04:00 AM Papers
Summary
I²RiMA is a novel intra-inter Riemannian manifold attention network for EEG-based mental stress detection. It constructs frequency-specific spatial covariance and uses temporal attention to improve cross-subject stress classification, achieving up to 82.78% balanced accuracy.
arXiv:2607.01279v1 Announce Type: new Abstract: Cross-subject EEG stress detection remains challenging because discriminative stress-related patterns are both subject-dependent and frequency-specific. Conventional Riemannian methods model spatial covariance mainly in the time domain, overlooking neural oscillations that are critical for high-level cognitive state decoding, while standard temporal tokenization often fragments inter-slice temporal coherence. To address these limitations, we propose \method{}, an Intra-Inter Riemannian Manifold Attention Network for EEG-based stress detection. \method{} constructs spatial covariance matrices independently at each frequency point and maps them to the SPD tangent space, preserving channel-wise geometry together with frequency-specific discriminative cues. It further introduces frequency cluster aggregation to select informative spectral components and reduce redundancy by forming compact, data-driven frequency clusters aligned with EEG rhythms. Finally, an intra-inter slice attention module adaptively integrates local slice-level spectral dynamics and global temporal context across EEG sequences. Experiments on three datasets show that \method{} consistently outperforms five state-of-the-art baselines, achieving up to 82.78\% balanced accuracy while remaining efficient with only 1.60M parameters and 31.95M FLOPs.
Original Article
View Cached Full Text
Cached at: 07/03/26, 05:39 AM
# I2RiMA: Spectral Riemannian Representation with Temporal Attention for Mental Stress Detection based on EEG Signals
Source: [https://arxiv.org/html/2607.01279](https://arxiv.org/html/2607.01279)
Cheng He∗1&Kunyu Peng∗2&Shangen Han∗1&Jinming Ma1&Jinhong Ding3&Likun Xia1† 1Laboratory of Neural Computing and Intelligent Perception, College of Information Engineering, Capital Normal University, Beijing, China 2Karlsruhe Institute of Technology, Germany 3School of Psychology, Capital Normal University, Beijing, China

###### Abstract

Cross\-subject EEG stress detection remains challenging because discriminative stress\-related patterns are both subject\-dependent and frequency\-specific\. Conventional Riemannian methods model spatial covariance mainly in the time domain, overlooking neural oscillations that are critical for high\-level cognitive state decoding, while standard temporal tokenization often fragments inter\-slice temporal coherence\. To address these limitations, we propose I2RiMA, an Intra\-Inter Riemannian Manifold Attention Network for EEG\-based stress detection\. I2RiMA constructs spatial covariance matrices independently at each frequency point and maps them to the SPD tangent space, preserving channel\-wise geometry together with frequency\-specific discriminative cues\. It further introduces frequency cluster aggregation to select informative spectral components and reduce redundancy by forming compact, data\-driven frequency clusters aligned with EEG rhythms\. Finally, an intra\-inter slice attention module adaptively integrates local slice\-level spectral dynamics and global temporal context across EEG sequences\. Experiments on three datasets show that I2RiMA consistently outperforms five state\-of\-the\-art baselines, achieving up to 82\.78% balanced accuracy while remaining efficient with only 1\.60M parameters and 31\.95M FLOPs\.

## 1Introduction

Mental stress is a widespread public health issue, especially among young people, with over 60% reporting chronic moderate\-to\-severe stressFuet al\.\([2023](https://arxiv.org/html/2607.01279#bib.bib1)\); Steelet al\.\([2014](https://arxiv.org/html/2607.01279#bib.bib2)\)\. As prolonged stress increases the risk of depression, anxiety, and cardiovascular diseaseMeieret al\.\([2015](https://arxiv.org/html/2607.01279#bib.bib3)\), objective and scalable stress detection is urgently needed\. EEG offers a promising solution due to its millisecond temporal resolution, non\-invasive and portable nature, and sensitivity to psychological statesda Silva \([2013](https://arxiv.org/html/2607.01279#bib.bib4)\)\. Recent wireless dry\-electrode systems further support everyday EEG\-based stress monitoring beyond clinical settingsArpaiaet al\.\([2020](https://arxiv.org/html/2607.01279#bib.bib5)\)\.

Considerable work has explored EEG\-based stress detection using traditional machine learning, deep learning, and Riemannian geometry\. However, existing methods face two key limitations\. First, conventional Riemannian approaches usually build covariance matrices from time\-domain EEG signalsMognonet al\.\([2011](https://arxiv.org/html/2607.01279#bib.bib47)\), which is suitable for low\-level tasks with temporally localized responses, such as visual evoked potential decodingWanget al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib19)\)\. In contrast, high\-level states like stress and emotion involve distributed, spectrally structured neural dynamics, making time\-domain channel covariance insufficient\. Although frequency\-domain covariance is a natural alternative, naive construction can mix frequency information and obscure physiologically meaningful band\-specific patterns\. Moreover, EEG covariance matrices lie on the SPD manifoldBarachantet al\.\([2010](https://arxiv.org/html/2607.01279#bib.bib8)\); direct Euclidean vectorization ignores this geometry and causes information loss, especially under cross\-subject distribution shiftsYgeret al\.\([2016](https://arxiv.org/html/2607.01279#bib.bib9)\)\.

Second, most existing EEG tokenization pipelines segment continuous signals into fixed\-length windows and process each window independently\. This design discards temporal coherence across adjacent slices, even though such context has been shown to carry discriminative information for mental state classificationGrissmannet al\.\([2017](https://arxiv.org/html/2607.01279#bib.bib10)\)\. Recent transformer\-based EEG modelsYanget al\.\([2023](https://arxiv.org/html/2607.01279#bib.bib6)\); Jianget al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib7)\); Liet al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib11)\)have improved sequence modeling, but they still often treat each EEG slice as an isolated token\. As a result, they struggle to jointly capture intra\-slice local spectral structure and inter\-slice global temporal dependencies, both of which are essential for robust high\-level cognitive state recognition\.

Motivated by these observations, we propose I2RiMA, an Intra\-Inter Riemannian Manifold Attention Network for EEG\-based stress detection\. I2RiMA is built on two key insights\. First, stress\- and emotion\-related EEG signals exhibit state\-dependent, frequency\-specific spatial patterns\. Since distinct EEG bands reflect different neural dynamicsBuzsákiet al\.\([2012](https://arxiv.org/html/2607.01279#bib.bib12)\), I2RiMA constructs covariance matrices independently at each frequency point and maps them to the SPD tangent space, preserving channel\-wise Riemannian geometry without prematurely mixing spectral information\. A frequency cluster aggregation module further selects informative frequency\-specific Riemannian features and reduces redundancy, yielding compact and interpretable representations\. Second, longer temporal windows improve intra\-slice separability while reducing inter\-slice variability, suggesting complementary local and global temporal cues\. To capture this structure, I2RiMA introduces an intra\-inter slice attention fusion module that adaptively aggregates EEG slices via linear encoding, attention weighting, and weighted fusion, preserving local discriminative patterns while modeling sequence\-level temporal context\. Together, the frequency\-aware Riemannian representation and intra\-inter slice attention fusion enable geometry\-preserving, physiologically grounded, and temporally contextualized EEG representations for robust cross\-subject stress detection\.

Our main contributions are summarized as follows:

1. 1\.We propose I2RiMA, a frequency\-aware Riemannian network for cross\-subject EEG stress detection\. It constructs covariance matrices at each frequency point and maps them to the SPD tangent space, preserving channel\-wise geometry and frequency\-specific discriminative patterns\.
2. 2\.We introduce frequency cluster aggregation for data\-driven feature selection and redundancy reduction, together with intra\-inter slice attention fusion to capture both local slice\-level patterns and global temporal dependencies in continuous EEG\.
3. 3\.Experiments show that I2RiMA achieves state\-of\-the\-art B\.ACC of 77\.59%, 75\.88%, and 82\.78% on MIST Control, MIST Stress, and SEED, respectively, while using only 1\.60M parameters and 31\.95M FLOPs\.

## 2Related Work

EEG\-Based Stress DetectionEEG\-based stress detection methods generally follow two paradigms: traditional machine learning and deep learning\. Traditional methods rely on handcrafted temporal, spectral, or nonlinear features with classifiers such as SVM and KNN\. Representative studies include feature fusion for workload identificationPeiet al\.\([2020](https://arxiv.org/html/2607.01279#bib.bib13)\), frontal alpha asymmetry for real\-time stress assessmentArpaiaet al\.\([2020](https://arxiv.org/html/2607.01279#bib.bib5)\), multi\-level stress classification with combined EEG and ECG featuresXiaet al\.\([2018](https://arxiv.org/html/2607.01279#bib.bib29)\), and linear/nonlinear EEG features for depression detectionCaiet al\.\([2018](https://arxiv.org/html/2607.01279#bib.bib14)\)\. Deep learning methods enable end\-to\-end representation learning by capturing spatial, spectral, and temporal EEG patterns, including spatio\-temporal modeling for cross\-subject motor imagery decodingLvet al\.\([2025](https://arxiv.org/html/2607.01279#bib.bib15)\), R3DCNN for workload assessmentZhanget al\.\([2018](https://arxiv.org/html/2607.01279#bib.bib16)\), MuLHiTA with multi\-branch LSTM and hierarchical attentionXiaet al\.\([2022](https://arxiv.org/html/2607.01279#bib.bib17)\), and phase\-space reconstruction with geometric features for depression detectionAkbariet al\.\([2021](https://arxiv.org/html/2607.01279#bib.bib18)\)\. Despite their progress, many methods remain limited to single\-subject or small\-scale evaluations and often suffer substantial performance degradation in cross\-subject settings due to inter\-subject variabilityGiannakakiset al\.\([2019](https://arxiv.org/html/2607.01279#bib.bib20)\)\.

Cross\-Subject EEG Learning\.Cross\-subject generalization is a core challenge in EEG decoding\. BIOTYanget al\.\([2023](https://arxiv.org/html/2607.01279#bib.bib6)\)employs channel embedding alignment to mitigate individual differences, LaBraMJianget al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib7)\)leverages large\-scale pre\-training for enhanced generalization, and NeuroBOLTLiet al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib11)\)synthesizes fMRI signals from raw EEG via multi\-dimensional representation learning\. Domain adaptation and transfer learning are dominant strategiesLotteet al\.\([2018](https://arxiv.org/html/2607.01279#bib.bib45)\), yet they primarily address cross\-subject distribution shifts while neglecting intra\-subject temporal domain drift \(ISTDS\)Jayaramet al\.\([2016](https://arxiv.org/html/2607.01279#bib.bib21)\)—the phenomenon where a single subject’s EEG distribution shifts across experimental sessions due to fatigue, electrode impedance changes, and physiological fluctuationsZanettiet al\.\([2021](https://arxiv.org/html/2607.01279#bib.bib22)\)\.

Riemannian Learning for EEG\.Covariance matrices of multi\-channel EEG signals lie on the SPD manifold𝒮\+\+C\\mathcal\{S\}\_\{\+\+\}^\{C\}, which possesses non\-Euclidean geometric structure\. Traditional vectorization in Euclidean space destroys this structure, leading to information lossBarachantet al\.\([2010](https://arxiv.org/html/2607.01279#bib.bib8)\)\. The affine\-invariant Riemannian metric \(AIRM\) provides robustness to individual differencesBarachantet al\.\([2011](https://arxiv.org/html/2607.01279#bib.bib32)\), while the Log\-Euclidean metric offers computational efficiency with geometric consistencyCongedoet al\.\([2017](https://arxiv.org/html/2607.01279#bib.bib23)\)\. Riemannian methods have demonstrated significant performance improvements in BCI and EEG classification, particularly in cross\-subject scenariosYgeret al\.\([2016](https://arxiv.org/html/2607.01279#bib.bib9)\)\. However, existing Riemannian approaches typically construct a single covariance matrix from time\-domain signals, mixing frequency information and obscuring frequency\-specific spatial patternsCongedoet al\.\([2017](https://arxiv.org/html/2607.01279#bib.bib23)\)\.

Temporal Aggregation and Attention Modeling\.Transformer\-based methodsYanget al\.\([2023](https://arxiv.org/html/2607.01279#bib.bib6)\); Jianget al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib7)\); Liet al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib11)\)model EEG sequences at the token level, but often process each slice independently, overlooking both intra\-slice spectral dynamics and inter\-slice temporal coherence\. Although hierarchical and manifold attention mechanismsXiaet al\.\([2022](https://arxiv.org/html/2607.01279#bib.bib17)\); Panet al\.\([2022](https://arxiv.org/html/2607.01279#bib.bib26)\)have been explored for temporal aggregation, they typically assume stable feature distributions across subjects and sessions\. In contrast, I2RiMA jointly integrates frequency\-aware Riemannian manifold learning with intra\-inter slice attention fusion for cross\-subject EEG stress detection, addressing this gapGiannakakiset al\.\([2019](https://arxiv.org/html/2607.01279#bib.bib20)\)\.

## 3Problem Formulation

#### Input Definition

Let𝒟=\{\(𝐗u,yu\)\}u=1U\\mathcal\{D\}=\\\{\(\\mathbf\{X\}\_\{u\},y\_\{u\}\)\\\}\_\{u=1\}^\{U\}denote an EEG dataset where𝐗u∈ℝC×L\\mathbf\{X\}\_\{u\}\\in\\mathbb\{R\}^\{C\\times L\}is theuu\-th trial withCCchannels andLLsamples, andyu∈\{1,…,G\}y\_\{u\}\\in\\\{1,\\ldots,G\\\}is the class label\. Each trial is segmented intommnon\-overlapping slices𝒮=\{s1,…,sm\}\\mathcal\{S\}=\\\{s\_\{1\},\\ldots,s\_\{m\}\\\}withsi∈ℝC×Ts\_\{i\}\\in\\mathbb\{R\}^\{C\\times T\}\. The FFT transforms each slice into the frequency domain:𝐗freq∈ℝB×m×C×F\\mathbf\{X\}\_\{\\text\{freq\}\}\\in\\mathbb\{R\}^\{B\\times m\\times C\\times F\}, withBBbatch size andFFfrequency points\.

#### Task Definition

We define the cross\-subject stress detection task as learning a classifierfθf\_\{\\theta\}that generalizes to unseen subjects\. Formally, given training subjects𝒫train\\mathcal\{P\}\_\{\\text\{train\}\}and test subjects𝒫test\\mathcal\{P\}\_\{\\text\{test\}\}with𝒫train∩𝒫test=∅\\mathcal\{P\}\_\{\\text\{train\}\}\\cap\\mathcal\{P\}\_\{\\text\{test\}\}=\\emptyset, the objective is to minimize the expected cross\-entropy lossℓ\\ellover the test distribution, as defined in Eq\.[1](https://arxiv.org/html/2607.01279#S3.E1):

θ∗=arg⁡minθ⁡𝔼\(𝐗,y\)∼𝒟test\[ℓ\(fθ\(𝐗\),y\)\]\\theta^\{\*\}=\\arg\\min\_\{\\theta\}\\;\\mathbb\{E\}\_\{\(\\mathbf\{X\},y\)\\sim\\mathcal\{D\}\_\{\\text\{test\}\}\}\\left\[\\ell\\left\(f\_\{\\theta\}\(\\mathbf\{X\}\),y\\right\)\\right\]\(1\)

#### Learning Objective

We seek a representation mappingϕ:ℝC×L→ℝd\\phi:\\mathbb\{R\}^\{C\\times L\}\\to\\mathbb\{R\}^\{d\}that satisfies three properties: \(1\)geometry preservation—ϕ\\phiretains the spatial structure of EEG signals via Riemannian manifold modeling; \(2\)temporal coherence—ϕ\\phiencodes both intra\-slice local spectral patterns and inter\-slice global dependencies through attention fusion; \(3\)cross\-subject robustness—ϕ\\phiis invariant to inter\-subject distribution shifts\.

## 4Method

### 4\.1Overview of I2RiMA

I2RiMA addresses cross\-subject EEG stress detection through the two\-module architecture illustrated in Figure[1](https://arxiv.org/html/2607.01279#S4.F1)\. Given raw multi\-channel EEG signals of dimensionℝC×L\\mathbb\{R\}^\{C\\times L\}\(C=64C=64channels\), each trial is first segmented intommnon\-overlapping slices and transformed into the frequency domain via FFT, yielding𝐗freq∈ℝm×C×F\\mathbf\{X\}\_\{\\text\{freq\}\}\\in\\mathbb\{R\}^\{m\\times C\\times F\}whereFFis the number of frequency bins\. Standard preprocessing steps including resampling, ICA artifact removal, bandpass filtering \(0\.5–50 Hz\), and z\-score normalization are applied prior to segmentation\.

The Riemannian Manifold Feature Extraction \(RMFE\) module constructs a covariance matrix𝐑f∈𝒮\+\+C\\mathbf\{R\}\_\{f\}\\in\\mathcal\{S\}\_\{\+\+\}^\{C\}independently at each frequency pointff, preserving frequency\-specific spatial correlations that are critical for stress discrimination\. Each SPD matrix is then mapped to the tangent space via the Log\-Euclidean operator and vectorized as𝐡f∈ℝD0\\mathbf\{h\}\_\{f\}\\in\\mathbb\{R\}^\{D\_\{0\}\}, yielding spectral\-spatial features𝐇freq∈ℝm×F×D0\\mathbf\{H\}\_\{\\text\{freq\}\}\\in\\mathbb\{R\}^\{m\\times F\\times D\_\{0\}\}withD0=C\(C\+1\)/2D\_\{0\}=C\(C\+1\)/2\.

The Unsupervised Slice Attention Aggregation \(USAA\) module then processes these features in two stages\. First, K\-Means clustering groups correlated frequencies intoKKclusters corresponding to canonical EEG bands; within\-cluster weighted aggregation followed by across\-cluster concatenation produces compact slice features𝐇∈ℝm×D\\mathbf\{H\}\\in\\mathbb\{R\}^\{m\\times D\}withD=K×D0D=K\\times D\_\{0\}\. Second, an intra\-inter slice attention fusion mechanism adaptively weights themmtemporal slices: each slice feature is projected through a linear encoding layer to𝐇enc∈ℝm×denc\\mathbf\{H\}\_\{\\text\{enc\}\}\\in\\mathbb\{R\}^\{m\\times d\_\{\\text\{enc\}\}\}, followed by channel\-wise mean pooling and a fully connected softmax layer to compute slice\-specific attention weights𝜶\\boldsymbol\{\\alpha\}\. The weighted aggregation𝐇agg=∑jαj𝐇enc\(:,j,:\)\\mathbf\{H\}\_\{\\text\{agg\}\}=\\sum\_\{j\}\\alpha\_\{j\}\\mathbf\{H\}\_\{\\text\{enc\}\}\(:,j,:\)produces a fixed\-dimensional trial\-level representation, which is fed to a fully connected classifier for stress level prediction\. All parameters are jointly optimized end\-to\-end via backpropagation\.

![Refer to caption](https://arxiv.org/html/2607.01279v1/Figure_1.png)Figure 1:Overview of I2RiMA\. The pipeline comprises four stages: Preprocessing and FFT; USAA Module to perform frequency cluster aggregation via K\-Means and intra\-inter slice attention fusion; RMFE Module to perform frequency\-wise covariance construction and Log\-Euclidean tangent\-space mapping; Classification for mental stress detection\.
### 4\.2Frequency\-Aware Riemannian Representation

#### Motivation\.

Traditional Riemannian EEG methods construct a single covariance matrix from time\-domain signals, mixing frequency information across bands\. For stress detection—a task where frequency\-domain features have been shown to carry critical discriminative informationda Silva \([2013](https://arxiv.org/html/2607.01279#bib.bib4)\)—this obscures frequency\-specific spatial patterns\.

#### Frequency\-wise covariance construction\.

After FFT, the input tensor has dimension𝐗freq∈ℝB×m×C×F\\mathbf\{X\}\_\{\\text\{freq\}\}\\in\\mathbb\{R\}^\{B\\times m\\times C\\times F\}\. For each frequency pointf∈\{1,…,F\}f\\in\\\{1,\\ldots,F\\\}, we extract the spectral amplitude vector across all channels, as defined in Eq\.[2](https://arxiv.org/html/2607.01279#S4.E2)\. We then independently construct a covariance matrix from𝐱b,j,:,f\\mathbf\{x\}\_\{b,j,:,f\}, addingε=10−6\\varepsilon=10^\{\-6\}times the identity matrix𝐈\\mathbf\{I\}to ensure positive definiteness, as defined in Eq\.[3](https://arxiv.org/html/2607.01279#S4.E3)\. This preserves frequency\-specific spatial correlations that single time\-domain covariance matrices would conflate\.

𝐱b,j,:,f=𝐗freq\(b,j,:,f\)∈ℝC\\mathbf\{x\}\_\{b,j,:,f\}=\\mathbf\{X\}\_\{\\text\{freq\}\}\(b,j,:,f\)\\in\\mathbb\{R\}^\{C\}\(2\)𝐑b,j,f=𝐱b,j,:,f𝐱b,j,:,f⊤\+ε𝐈\\mathbf\{R\}\_\{b,j,f\}=\\mathbf\{x\}\_\{b,j,:,f\}\\,\\mathbf\{x\}\_\{b,j,:,f\}^\{\\top\}\+\\varepsilon\\mathbf\{I\}\(3\)

#### Log\-Euclidean tangent\-space mapping\.

Each covariance matrix𝐑b,j,f∈𝒮\+\+C\\mathbf\{R\}\_\{b,j,f\}\\in\\mathcal\{S\}\_\{\+\+\}^\{C\}resides on the SPD manifold, which lacks vector space structure\. We employ the Log\-Euclidean mapping to project onto the tangent space and extract the tangent\-space feature vector, as defined in Eq\.[4](https://arxiv.org/html/2607.01279#S4.E4):

𝐡b,j,f=vech⁡\(log⁡\(𝐑b,j,f\)\)∈ℝD0\\mathbf\{h\}\_\{b,j,f\}=\\operatorname\{vech\}\\\!\\left\(\\log\(\\mathbf\{R\}\_\{b,j,f\}\)\\right\)\\in\\mathbb\{R\}^\{D\_\{0\}\}\(4\)wherelog⁡\(𝐑b,j,f\)=𝐔log⁡\(𝚲\)𝐔⊤\\log\(\\mathbf\{R\}\_\{b,j,f\}\)=\\mathbf\{U\}\\log\(\\boldsymbol\{\\Lambda\}\)\\mathbf\{U\}^\{\\top\}is computed via eigendecomposition𝐑b,j,f=𝐔𝚲𝐔⊤\\mathbf\{R\}\_\{b,j,f\}=\\mathbf\{U\}\\boldsymbol\{\\Lambda\}\\mathbf\{U\}^\{\\top\}with𝐔∈ℝC×C\\mathbf\{U\}\\in\\mathbb\{R\}^\{C\\times C\}and diagonal𝚲∈ℝC×C\\boldsymbol\{\\Lambda\}\\in\\mathbb\{R\}^\{C\\times C\},vech⁡\(⋅\)\\operatorname\{vech\}\(\\cdot\)extracts the upper triangular portion, andD0=C\(C\+1\)/2=2080D\_\{0\}=C\(C\+1\)/2=2080forC=64C=64\. Collecting all frequency features yields𝐇freq∈ℝB×m×F×D0\\mathbf\{H\}\_\{\\text\{freq\}\}\\in\\mathbb\{R\}^\{B\\times m\\times F\\times D\_\{0\}\}, which serves as input to the frequency cluster aggregation module \(Section 4\.3\)\.

#### Why this preserves spatial structure\.

The covariance matrix encodes spatial information: diagonal elements reflect channel power, off\-diagonal elements capture inter\-channel functional connectivityBuzsákiet al\.\([2012](https://arxiv.org/html/2607.01279#bib.bib12)\)\. The Log\-Euclidean mapping preserves the geometric structure of the SPD manifold, whereas traditional vectorization in Euclidean space destroys this structure, leading to information loss that is amplified under cross\-subject conditions\.

#### Spatial\-frequency duality\.

A key insight from neuroscience is that low\-frequency bands \(delta, theta\) reflect global brain\-state modulation across widespread cortical regions, while high\-frequency bands \(beta, gamma\) capture local, task\-specific processingBuzsákiet al\.\([2012](https://arxiv.org/html/2607.01279#bib.bib12)\)\. Under stress, high\-frequency components exhibit greater sensitivity to state transitionsZhanget al\.\([2020](https://arxiv.org/html/2607.01279#bib.bib39)\)\. This spatial\-frequency duality—global structure in low frequencies, local structure in high frequencies—motivates our frequency\-aware approach: by constructing covariance matrices independently per frequency point rather than collapsing across frequencies, we preserve both global and local spatial\-frequency relationships that are critical for cross\-subject stress detection\.

#### Riemannian invariance for cross\-subject robustness\.

The Log\-Euclidean mapping preserves the structure of the affine\-invariant Riemannian metric on𝒮\+\+C\\mathcal\{S\}\_\{\+\+\}^\{C\}, which is invariant to affine transformations𝐑b,j,f↦𝐖𝐑b,j,f𝐖⊤\\mathbf\{R\}\_\{b,j,f\}\\mapsto\\mathbf\{W\}\\mathbf\{R\}\_\{b,j,f\}\\mathbf\{W\}^\{\\top\}for any invertible𝐖\\mathbf\{W\}Barachantet al\.\([2011](https://arxiv.org/html/2607.01279#bib.bib32)\)\. Consequently, our tangent\-space features𝐡b,j,f\\mathbf\{h\}\_\{b,j,f\}\(Eq\.[4](https://arxiv.org/html/2607.01279#S4.E4)\) are inherently robust to inter\-subject variations such as electrode displacement and impedance changes, and the subsequent cluster aggregation and attention fusion operate in a geometry\-aware feature space where cross\-subject variability is naturally mitigated\.

### 4\.3Frequency Cluster Aggregation

#### Motivation\.

ConstructingFFcovariance matrices yields high\-dimensional features \(F×D0F\\times D\_\{0\}\), increasing computational cost\. Moreover, individual frequency points carry limited independent physiological significance, as established EEG bands span multiple adjacent frequenciesda Silva \([2013](https://arxiv.org/html/2607.01279#bib.bib4)\)\. We propose frequency cluster aggregation not as mere dimensionality reduction, but as a principledfeature selection and redundancy eliminationmechanism that groups correlated frequency points into physiologically interpretable clusters\.

#### Cluster determination Aggregation\.

We apply the elbow method to determine the optimal number of clustersKKby analyzing the inertia curve across cluster counts\. For MIST Control,K=5K=5; for MIST Stress,K=6K=6\. In this section,k∈\{1,…,K\}k\\in\\\{1,\\ldots,K\\\}denotes the cluster index andKKthe total number of clusters\. We construct a frequency cluster assignment matrix𝐀∈ℝK×F\\mathbf\{A\}\\in\\mathbb\{R\}^\{K\\times F\}, whereAk,fA\_\{k,f\}indicates the membership weight of frequencyffto clusterkk\. The aggregation proceeds in three steps: \(1\) for each clusterkk, the cluster feature is computed as the weighted sum of its member frequency features; \(2\) theKKcluster features are concatenated along the feature dimension to form the slice representation𝐇b,j\\mathbf\{H\}\_\{b,j\}; \(3\) allmmslice representations are collected into the batch\-level tensor𝐇\\mathbf\{H\}, wherej∈\{1,…,m\}j\\in\\\{1,\\ldots,m\\\}indexes the temporal slices\. The resulting𝐇\\mathbf\{H\}serves as input to the intra\-inter slice fusion module \(Section 4\.4\)\. The complete aggregation process is formalized as follows:

𝐡b,j,k\\displaystyle\\mathbf\{h\}\_\{b,j,k\}=∑f=1FAk,f⋅𝐡b,j,f\\displaystyle=\\sum\_\{f=1\}^\{F\}A\_\{k,f\}\\cdot\\mathbf\{h\}\_\{b,j,f\}\(5\)𝐇b,j\\displaystyle\\mathbf\{H\}\_\{b,j\}=\[𝐡b,j,1,…,𝐡b,j,K\]∈ℝD,D=K×D0\\displaystyle=\[\\mathbf\{h\}\_\{b,j,1\},\\ldots,\\mathbf\{h\}\_\{b,j,K\}\]\\in\\mathbb\{R\}^\{D\},\\quad D=K\\times D\_\{0\}\(6\)𝐇\\displaystyle\\mathbf\{H\}=\[𝐇b,1;…;𝐇b,m\]∈ℝB×m×D\\displaystyle=\[\\mathbf\{H\}\_\{b,1\};\\ldots;\\mathbf\{H\}\_\{b,m\}\]\\in\\mathbb\{R\}^\{B\\times m\\times D\}\(7\)

#### Physiological interpretability\.

The data\-driven clustering aligns with canonical EEG bandsBrittonet al\.\([2016](https://arxiv.org/html/2607.01279#bib.bib35)\): Cluster 1 \(11–33Hz\) corresponds to delta, Cluster 2 \(44–1212Hz\) spans theta and alpha, Cluster 3 \(1313–2525Hz\) maps to beta, Cluster 4 \(2626–4949Hz\) corresponds to gamma, and Cluster 5 \(5050Hz\) captures line noise\. This alignment validates that EEG frequency structure exhibits intrinsic cluster patterns, and our aggregation leverages this structure for stable, efficient representation\. The geometric interpretation showing that frequency cluster aggregation performs principled data\-driven grouping in Riemannian geometry is provided in Appendix[F](https://arxiv.org/html/2607.01279#A6)\.

### 4\.4Intra\-Inter Slice Fusion

#### Motivation\.

After Riemannian feature extraction, each trial is represented asmmslice features𝐇∈ℝB×m×D\\mathbf\{H\}\\in\\mathbb\{R\}^\{B\\times m\\times D\}\(Eq\.[7](https://arxiv.org/html/2607.01279#S4.E7)\)\. Existing methods aggregate slices via simple averaging or max pooling, assuming equal contribution from all slices\. However, in stress detection, slices differ significantly in discriminative information—some capture critical state transitions while others contain mostly noise\. Equal\-weight aggregation dilutes informative signals and accumulates noise\.

#### Slice number selection\.

The slice numbermmis a key hyperparameter that controls the granularity of temporal modeling\. Increasingmmprovides finer temporal resolution but incurs additional computation\. Inspired by microeconomics—where the optimal consumption level is reached when marginal utility approaches zero—we define the marginal effectME\(m\)\\text\{ME\}\(m\)as the accuracy gain per unit of additional computation, whereAcc\(m\)\\text\{Acc\}\(m\)denotes the accuracy withmmslices andFLOPs\(m\)\\text\{FLOPs\}\(m\)the corresponding computational cost, as defined in Eq\.[8](https://arxiv.org/html/2607.01279#S4.E8):

ME\(m\)=Acc\(m\)−Acc\(m−1\)FLOPs\(m\)−FLOPs\(m−1\)\\text\{ME\}\(m\)=\\frac\{\\text\{Acc\}\(m\)\-\\text\{Acc\}\(m\-1\)\}\{\\text\{FLOPs\}\(m\)\-\\text\{FLOPs\}\(m\-1\)\}\(8\)WhenME\(m\)\>0\\text\{ME\}\(m\)\>0, adding themm\-th slice still improves accuracy; whenME\(m\)≤0\\text\{ME\}\(m\)\\leq 0, further slices yield no benefit\. The optimal slice numberm∗m^\{\*\}is the largestmmwith positive marginal effect, subject to a minimum accuracy thresholdAccmin\\text\{Acc\}\_\{\\min\}and a computational budgetFLOPslimit\\text\{FLOPs\}\_\{\\text\{limit\}\}, as defined in Eq\.[9](https://arxiv.org/html/2607.01279#S4.E9):

m∗=max⁡\{m:ME\(m\)\>0\}s\.t\.Acc\(m\)≥Accmin,FLOPs\(m\)≤FLOPslimitm^\{\*\}=\\max\\\{m:\\text\{ME\}\(m\)\>0\\\}\\quad\\text\{s\.t\.\}\\quad\\text\{Acc\}\(m\)\\geq\\text\{Acc\}\_\{\\min\},\\;\\text\{FLOPs\}\(m\)\\leq\\text\{FLOPs\}\_\{\\text\{limit\}\}\(9\)

#### Intra\-slice modeling\.

Whenm∗=1m^\{\*\}=1, the network processes a single slice, and the Riemannian feature extraction module alone provides intra\-slice local spectral\-spatial modeling\. This corresponds to the R\-I2RiMA variant in our ablation study\.

#### Inter\-slice attention fusion\.

Form∗\>1m^\{\*\}\>1, we propose an attention mechanismVaswaniet al\.\([2017](https://arxiv.org/html/2607.01279#bib.bib30)\)that adaptively weightsm∗m^\{\*\}temporal slices to highlight discriminative information\. Letdenc=128d\_\{\\text\{enc\}\}=128denote the encoding dimension\. The attention fusion proceeds in four steps: \(1\) linear encoding with weight𝐖∈ℝD×denc\\mathbf\{W\}\\in\\mathbb\{R\}^\{D\\times d\_\{\\text\{enc\}\}\}and bias𝐛∈ℝdenc\\mathbf\{b\}\\in\\mathbb\{R\}^\{d\_\{\\text\{enc\}\}\}projects slice features𝐇∈ℝB×m∗×D\\mathbf\{H\}\\in\\mathbb\{R\}^\{B\\times m^\{\*\}\\times D\}to a lower\-dimensional space; \(2\) channel\-wise mean pooling compresses each encoded slice to a scalar; \(3\) a fully connected layer with softmax computes attention weights𝜶=\[α1,…,αm∗\]⊤\\boldsymbol\{\\alpha\}=\[\\alpha\_\{1\},\\ldots,\\alpha\_\{m^\{\*\}\}\]^\{\\top\}with∑j=1m∗αj=1\\sum\_\{j=1\}^\{m^\{\*\}\}\\alpha\_\{j\}=1; \(4\) weighted aggregation produces the trial\-level representation\. Shared weights across the batch enhance generalization\. The complete process is formalized as follows:

𝐇enc\\displaystyle\\mathbf\{H\}\_\{\\text\{enc\}\}=ReLU\(𝐇𝐖\+𝐛\)∈ℝB×m∗×denc\\displaystyle=\\text\{ReLU\}\(\\mathbf\{H\}\\mathbf\{W\}\+\\mathbf\{b\}\)\\in\\mathbb\{R\}^\{B\\times m^\{\*\}\\times d\_\{\\text\{enc\}\}\}\(10\)𝐡pool\(:,j\)\\displaystyle\\mathbf\{h\}\_\{\\text\{pool\}\}\(:,j\)=1denc∑d=1denc𝐇enc\(:,j,d\)\\displaystyle=\\frac\{1\}\{d\_\{\\text\{enc\}\}\}\\sum\_\{d=1\}^\{d\_\{\\text\{enc\}\}\}\\mathbf\{H\}\_\{\\text\{enc\}\}\(:,j,d\)\(11\)𝜶\\displaystyle\\boldsymbol\{\\alpha\}=Softmax\(FC\(𝐡pool\)\)∈ℝB×m∗\\displaystyle=\\text\{Softmax\}\(\\text\{FC\}\(\\mathbf\{h\}\_\{\\text\{pool\}\}\)\)\\in\\mathbb\{R\}^\{B\\times m^\{\*\}\}\(12\)𝐇agg\\displaystyle\\mathbf\{H\}\_\{\\text\{agg\}\}=∑j=1m∗αj⋅𝐇enc\(:,j,:\)∈ℝB×denc\\displaystyle=\\sum\_\{j=1\}^\{m^\{\*\}\}\\alpha\_\{j\}\\cdot\\mathbf\{H\}\_\{\\text\{enc\}\}\(:,j,:\)\\in\\mathbb\{R\}^\{B\\times d\_\{\\text\{enc\}\}\}\(13\)

### 4\.5Classification

The classifier maps the aggregated features𝐇agg∈ℝB×denc\\mathbf\{H\}\_\{\\text\{agg\}\}\\in\\mathbb\{R\}^\{B\\times d\_\{\\text\{enc\}\}\}to stress level probabilities via a fully connected layer with weight𝐖c∈ℝdenc×G\\mathbf\{W\}\_\{c\}\\in\\mathbb\{R\}^\{d\_\{\\text\{enc\}\}\\times G\}and bias𝐛c∈ℝG\\mathbf\{b\}\_\{c\}\\in\\mathbb\{R\}^\{G\}, whereGGis the number of stress classes\. The model is trained by minimizing the cross\-entropy loss over allUUtrials, where𝐏u∈ℝG\\mathbf\{P\}\_\{u\}\\in\\mathbb\{R\}^\{G\}denotes the predicted probability vector for theuu\-th trial andyu∈ℝGy\_\{u\}\\in\\mathbb\{R\}^\{G\}is the one\-hot label\. Optimization uses Adam \(learning rate10−310^\{\-3\}, weight decay10−210^\{\-2\}\) for 300 epochs with batch normalization and dropout\. The classification and loss are formalized as follows:

𝐏\\displaystyle\\mathbf\{P\}=Softmax\(𝐖c𝐇agg\+𝐛c\)∈ℝB×G\\displaystyle=\\text\{Softmax\}\(\\mathbf\{W\}\_\{c\}\\mathbf\{H\}\_\{\\text\{agg\}\}\+\\mathbf\{b\}\_\{c\}\)\\in\\mathbb\{R\}^\{B\\times G\}\(14\)ℒ\\displaystyle\\mathcal\{L\}=−∑u=1Uyulog⁡\(𝐏u\)\\displaystyle=\-\\sum\_\{u=1\}^\{U\}y\_\{u\}\\log\(\\mathbf\{P\}\_\{u\}\)\(15\)

## 5Experimental Setup

Datasets\.We evaluate on three datasets:MIST Control\(30 subjects, 64 channels, 4\-class stress levels\),MIST Stress\(same subjects under time\-pressure and negative feedback, 4\-class\), andSEED\(15 subjects, 62 channels, 3\-class emotion\)\. We adopt stratified 5\-fold cross\-validation at the subject level with zero subject overlap\. Full dataset details and evaluation metrics are provided in Appendices[B](https://arxiv.org/html/2607.01279#A2)and[C](https://arxiv.org/html/2607.01279#A3)\.Baselines and Metrics\.We compare against five baselines: EEGNetLawhernet al\.\([2018](https://arxiv.org/html/2607.01279#bib.bib24)\), BIOTYanget al\.\([2023](https://arxiv.org/html/2607.01279#bib.bib6)\), LaBraMJianget al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib7)\), NeuroBOLTLiet al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib11)\), and CorrAttHuet al\.\([2025](https://arxiv.org/html/2607.01279#bib.bib25)\)\. All models are trained from scratch under identical data splits\. We adopt B\.ACC, Precision, Recall, F1, and AUC as evaluation metrics\. Baseline descriptions, metric definitions, and implementation details are provided in Appendices[D](https://arxiv.org/html/2607.01279#A4)–[E](https://arxiv.org/html/2607.01279#A5)\.

### 5\.1Comparison with Existing Networks

To demonstrate the effectiveness of I2RiMA, we compare it against five state\-of\-the\-art \(SOTA\) baselines networks covering diverse architectures: EEGNetLawhernet al\.\([2018](https://arxiv.org/html/2607.01279#bib.bib24)\)\(compact CNN\), BIOTYanget al\.\([2023](https://arxiv.org/html/2607.01279#bib.bib6)\)\(brain imaging transformer\), LaBraMJianget al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib7)\)\(large brain model\), NeuroBOLTLiet al\.\([2024](https://arxiv.org/html/2607.01279#bib.bib11)\)\(bolt encoder\), and CorrAttHuet al\.\([2025](https://arxiv.org/html/2607.01279#bib.bib25)\)\(correlation attention\)\. All models are trained from scratch using identical data splits and training configurations\.

Table[2](https://arxiv.org/html/2607.01279#S5.T2)summarizes the classification performance across three datasets\. I2RiMA consistently achieves the best results across all metrics\. On MIST Control, I2RiMA attains 77\.59% B\.ACC with 95\.61% AUC, substantially outperforming the strongest baseline EEGNet \(53\.26% B\.ACC\) by 24\.33 percentage points and delivering 83\.61% precision and 75\.79% F1 score\. On MIST Stress, which involves more subtle cognitive distinctions under induced pressure, I2RiMA reaches 75\.88% B\.ACC and 94\.57% AUC, surpassing CorrAtt \(46\.67% B\.ACC\) by 29\.21%—the largest margin among all datasets\. On SEED, I2RiMA achieves 82\.78% B\.ACC and 94\.45% AUC, exceeding EEGNet by 24\.77% while maintaining 83\.74% precision and 82\.76% F1\. The consistent improvements are particularly pronounced on MIST datasets, which employ cognitively complex paradigms with stronger low\-frequency componentsGrissmannet al\.\([2017](https://arxiv.org/html/2607.01279#bib.bib10)\)\. In such scenarios, signal slicing aggravates information loss, making the preservation of spatial structure and temporal coherence critical—precisely what I2RiMA provides through frequency\-aware Riemannian modeling and intra\-inter slice fusion\.

Table 1:Comparison of state\-of\-the\-art Network on Three Datasets\.- •Best results are highlighted in bold\.p∗∗<0\.01\{\}^\{\*\*\}p<0\.01indicates significant accuracy differences between I2RiMA and SOTA\.

Table 2:Ablation Study Results on Three Datasets\.- •Best results are highlighted in bold\. R: Riemannian module, I: Inter\-slice fusion\.p∗∗<0\.01\{\}^\{\*\*\}p<0\.01indicates significant accuracy differences between I2RiMA and other networks\.

### 5\.2Ablation Study

To assess each component of I2RiMA, we design four ablation variants:baselinewithout Riemannian modeling or inter\-slice fusion,R\-I2RiMAusing only the Riemannian module \(m=1m\{=\}1\),I\-I2RiMAusing only inter\-slice fusion, and the full I2RiMA\. Results are reported in Table[2](https://arxiv.org/html/2607.01279#S5.T2)\. The Riemannian module consistently improves over baseline, increasing B\.ACC by 8\.23%, 9\.95%, and 5\.85% on MIST Control, MIST Stress, and SEED, respectively\. This shows that preserving the intrinsic covariance geometry of EEG channels enhances cross\-subject generalization\. Inter\-slice fusion yields even larger gains, especially on MIST Control with a 37\.09% B\.ACC improvement, highlighting the importance of modeling temporal coherence under fragmented slice\-wise EEG information\. The two modules are also complementary\. The full I2RiMA outperforms the stronger single\-module variant, I\-I2RiMA, by 4\.26%, 33\.59%, and 23\.94% B\.ACC on the three datasets\. This suggests that Riemannian features provide geometrically meaningful spatial representations that enable attention\-based fusion to better exploit cross\-slice dependencies\. Overall, the full model achieves 77\.59%/75\.88%/82\.78% B\.ACC and 95\.61%/94\.57%/94\.45% AUC, confirming the benefit of joint spatial\-temporal modeling\.

### 5\.3Efficiency and Interpretability for Cross\-Subject EEG Stress Detection

Table[3](https://arxiv.org/html/2607.01279#S5.T3)compares computational efficiency\. I2RiMA requires only 1\.60M parameters and 31\.95M FLOPs \(m=20m\{=\}20\), which is 11\.9% of EEGNet’s FLOPs, 0\.4% of BIOT’s, and 0\.3% of LaBraM’s and NeuroBOLT’s\. Even withm=1m\{=\}1, I2RiMA achieves 1\.60M FLOPs while maintaining competitive accuracy\. This demonstrates that geometry\-preserving, attention\-based feature integration is far more parameter\-efficient than scaling up model capacity\.

Table 3:Efficiency Comparison\.- •Params and FLOPs are reported in millions \(M\)\.

![Refer to caption](https://arxiv.org/html/2607.01279v1/Figure_6.png)
Figure 2:Model Performance Comparison\)
To further visualize the performance–efficiency trade\-off, Figure[2](https://arxiv.org/html/2607.01279#S5.F2)presents a bubble chart where the x\-axis \(log scale\) denotes FLOPs, the y\-axis represents test accuracy on MIST Control, and bubble size encodes parameter count\. I2RiMA \(magenta bubbles\) occupies the upper\-left region—the Pareto\-optimal zone of low computational cost and high accuracy\. Specifically, I2RiMA withm=20m\{=\}20achieves 77\.59% accuracy at merely 31\.95M FLOPs and 1\.60M parameters, which amounts to only 11\.9% of EEGNet’s FLOPs \(268\.10M\) and less than 0\.4% of large\-scale baselines such as BIOT \(8140\.23M\), LaBraM \(11095\.78M\), and NeuroBOLT \(10504\.58M\)\. Even asmmincreases from 1 to 30, I2RiMA variants remain tightly clustered in the low\-FLOPs regime \(<<50M\) while monotonically improving accuracy—a scalability property that baseline models, which reside in the high\-FLOPs / lower\-accuracy quadrant, cannot match\. This geometry\-preserving design thus delivers order\-of\-magnitude efficiency gains over brute\-force capacity scaling, making I2RiMA particularly attractive for resource\-constrained deployment scenarios such as wearable brain\-computer interfaces\.

### 5\.4Channel Importance Topography Maps\.

To interpret which EEG channels I2RiMA relies on for classification, we visualize the learned channel attention weights as scalp topographic maps in Figure[3](https://arxiv.org/html/2607.01279#S5.F3)\. The revealed attention patterns align closely with the cognitive demands of each paradigm\.

On MIST Control \(Figure[3](https://arxiv.org/html/2607.01279#S5.F3)a\), I2RiMA assigns the highest importance to bilateral frontotemporal channels \(FT11, FT12; green boxes\) and occipitocerebellar channels \(CB1, CB2; purple boxes\)\. The frontotemporal emphasis reflects their established role in cognitive control and executive function during sustained mental arithmeticDavidsonet al\.\([2000](https://arxiv.org/html/2607.01279#bib.bib36)\), while the occipital/cerebellar activation is consistent with visual attention maintenance and sensorimotor coordination required by the MIST task\.

The MIST Stress condition \(Figure[3](https://arxiv.org/html/2607.01279#S5.F3)b\) exhibits a distinct reconfiguration: while frontotemporal channels \(FT11, FT12; green boxes\) remain highly weighted—now subserving stress regulation in addition to cognitive control—the occipital focus shifts centrally to Oz \(purple box\), and overall occipital sensitivity intensifies\. This pattern is consistent with enhanced visual processing of evaluative feedback \(e\.g\., error messages and time\-pressure warnings\) that characterizes the stress induction mechanismUlrich\-Lai and Herman \([2009](https://arxiv.org/html/2607.01279#bib.bib37)\), suggesting that I2RiMA adaptively recruits visual cortical resources when emotional salience increases\. In contrast, SEED \(Figure[3](https://arxiv.org/html/2607.01279#S5.F3)c\) engages a qualitatively different network: temporal channels \(T7, T8; green boxes\) and prefrontal channels \(Fp1, Fp2; purple boxes\) dominate the attention map\. This topology aligns with canonical emotion processing circuitsDavidsonet al\.\([2000](https://arxiv.org/html/2607.01279#bib.bib36)\), where the temporal lobes support emotional memory and semantic appraisal, and the prefrontal cortex regulates affective responses—a pattern markedly different from the cognitively\-driven MIST topographies\.

![Refer to caption](https://arxiv.org/html/2607.01279v1/Figure_7-1.png)\(a\)MIST Control Datasets
![Refer to caption](https://arxiv.org/html/2607.01279v1/Figure_7-2.png)\(b\)MIST Stress Datasets
![Refer to caption](https://arxiv.org/html/2607.01279v1/Figure_7-3.png)\(c\)SEED Datasets

Figure 3:Channel Importance Topographic Maps of I2RiMA in MIST Control, MIST Stress and SEED Datasets

## 6Discussion and Limitation

#### Why does Riemannian representation benefit cross\-subject generalization?

The affine\-invariant Riemannian metric on the SPD manifold possesses inherent invariance to affine transformationsBarachantet al\.\([2011](https://arxiv.org/html/2607.01279#bib.bib32)\), providing robustness to inter\-subject variations such as electrode displacement\. Our ablation confirms that even the Riemannian\-only variant \(R\-I2RiMA\) improves over baseline by 8–10 percentage points, demonstrating that spatial structure preservation is critical for cross\-subject transfer\. The frequency\-wise construction preserves spatial\-frequency duality: under stress, high\-frequency components \(beta, gamma\) exhibit greater sensitivity to state transitionsZhanget al\.\([2020](https://arxiv.org/html/2607.01279#bib.bib39)\), enabling capture of stress\-specific local cortical patterns\.

#### Why does intra\-inter slice fusion outperform single\-slice modeling?

Stress formation follows a dynamic allostatic processJusteret al\.\([2010](https://arxiv.org/html/2607.01279#bib.bib38)\)where different temporal windows contribute unequally\. Intra\-slice modeling captures local spectral patterns, while inter\-slice modeling captures global temporal coherenceHeet al\.\([2026](https://arxiv.org/html/2607.01279#bib.bib27)\)\. The ablation shows that I\-I2RiMA yields 37\.09% improvement on MIST Control, and the full model further improves by 4\.26%, confirming complementarity\. Moderate increases in slice count stabilize low\-frequency feature estimation and capture cross\-cycle temporal coherenceWuet al\.\([2025](https://arxiv.org/html/2607.01279#bib.bib28)\)\.

#### Limitations\.

\(1\) Our evaluation covers stress detection and emotion classification; generalization to other EEG decoding tasks \(motor imagery, seizure detection\) requires further validation\. \(2\) Dataset sizes remain limited \(30 and 15 subjects\); larger\-scale studies are needed\.

## 7Conclusion

We propose I2RiMA, an Intra\-Inter Riemannian Manifold Attention Network for cross\-subject EEG stress detection\. EEG signal segmentation into temporal slices inevitably causes information loss and insufficient feature extraction\. I2RiMA addresses this challenge via two complementary mechanisms: spatial structure preservation via frequency\-aware Riemannian modeling with cluster\-based feature selection, and temporal coherence recovery via intra\-inter slice attention fusion\. On three datasets, I2RiMA achieves SOTA performance \(B\.ACC: 77\.59%/75\.88%/82\.78%\) with only 1\.60M parameters and 31\.95M FLOPs, demonstrating that geometry\-preserving, attention\-based feature integration is both effective and efficient\. I2RiMA provides a detection foundation for closed\-loop stress regulation systems, with future work targeting online deployment and multi\-modal fusion\.

## ACKNOWLEDGMENTS

This work was supported in part by the Beijing Natural Science Foundation under Grant 4242033 and partially funded by the Deutsche Forschungsgemeinschaft \(DFG, German Research Foundation\) – SFB 1574 – 471687386\.

## References

- H\. Akbari, M\. T\. Sadiq, A\. U\. Rehman, M\. Ghazvini, R\. A\. Naqvi, M\. Payan, H\. Bagheri, and H\. Bagheri \(2021\)Depression recognition based on the reconstruction of phase space of eeg signals and geometrical features\.179,pp\. 108078\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1)\.
- A wearable eeg instrument for real\-time frontal asymmetry monitoring in worker stress analysis\.IEEE Transactions on Instrumentation and MeasurementAdvances in Neural Information Processing SystemsarXiv preprint arXiv:2405\.18765IEEE Transactions on Neural Systems and Rehabilitation EngineeringIEEE Transactions on Affective ComputingAdvances in Neural Information Processing SystemsNature Reviews NeuroscienceIEEE Transactions on Instrumentation and MeasurementInterdisciplinary Sciences: Computational Life SciencesInformation SciencesIEEE Transactions on Neural Systems and Rehabilitation EngineeringIEEE Transactions on Neural Networks and Learning SystemsApplied AcousticsIEEE Transactions on Affective ComputingIEEE Computational Intelligence MagazineJournal of Ambient Intelligence and Humanized ComputingBrain\-Computer InterfacesJournal of Neural EngineeringAdvances in Neural Information Processing SystemsBiomedical Signal Processing and ControlBiomedical Signal Processing and ControlAdvances in Neural Information Processing SystemsIEEE Transactions on Affective ComputingIEEE Transactions on Biomedical EngineeringNeurocomputingPsychological BulletinNature Reviews NeuroscienceNeuroscience & Biobehavioral ReviewsIEEE Transactions on Neural Systems and Rehabilitation EngineeringNaturePsychiatry InvestigationNeuroImageJournal of Neural EngineeringIEEE Transactions on Instrumentation and MeasurementPsychophysiology69\(10\),pp\. 8335–8343\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p1.1),[§2](https://arxiv.org/html/2607.01279#S2.p1.1)\.
- A\. Barachant, S\. Bonnet, M\. Congedo, and C\. Jutten \(2010\)Riemannian geometry applied to bci classification\.InInternational Conference on Latent Variable Analysis and Signal Separation,pp\. 629–636\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p2.1),[§2](https://arxiv.org/html/2607.01279#S2.p3.1)\.
- A\. Barachant, S\. Bonnet, M\. Congedo, and C\. Jutten \(2011\)Multiclass brain–computer interface classification by riemannian geometry\.59\(4\),pp\. 920–928\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p3.1),[§4\.2](https://arxiv.org/html/2607.01279#S4.SS2.SSS0.Px6.p1.4),[§6](https://arxiv.org/html/2607.01279#S6.SS0.SSS0.Px1.p1.1)\.
- J\. W\. Britton, L\. C\. Frey, J\. L\. Hopp, P\. Korb, M\. Z\. Koubeissi, W\. E\. Lievens, E\. M\. Pestana\-Knight, and E\. K\. St Louis \(2016\)Electroencephalography \(eeg\): an introductory text and atlas of normal and abnormal findings in adults, children, and infants\.Cited by:[§4\.3](https://arxiv.org/html/2607.01279#S4.SS3.SSS0.Px3.p1.9)\.
- G\. Buzsáki, C\. A\. Anastassiou, and C\. Koch \(2012\)The origin of extracellular fields and currents—eeg, ecog, lfp and spikes\.13\(6\),pp\. 407–420\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p4.1),[§4\.2](https://arxiv.org/html/2607.01279#S4.SS2.SSS0.Px4.p1.1),[§4\.2](https://arxiv.org/html/2607.01279#S4.SS2.SSS0.Px5.p1.1)\.
- H\. Cai, Y\. Chen, J\. Han, X\. Zhang, and B\. Hu \(2018\)Study on feature selection methods for depression detection using three\-electrode eeg data\.10\(3\),pp\. 558–565\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1)\.
- M\. Congedo, A\. Barachant, and R\. Bhatia \(2017\)Riemannian geometry for eeg\-based brain\-computer interfaces; a primer and a review\.4\(3\),pp\. 155–174\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p3.1)\.
- F\. L\. da Silva \(2013\)EEG and meg: relevance to neuroscience\.Neuron80\(5\),pp\. 1112–1128\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p1.1),[§4\.2](https://arxiv.org/html/2607.01279#S4.SS2.SSS0.Px1.p1.1),[§4\.3](https://arxiv.org/html/2607.01279#S4.SS3.SSS0.Px1.p1.2)\.
- R\. J\. Davidson, D\. C\. Jackson, and N\. H\. Kalin \(2000\)Emotion, plasticity, context, and regulation: perspectives from affective neuroscience\.\.126\(6\),pp\. 890\.Cited by:[§5\.4](https://arxiv.org/html/2607.01279#S5.SS4.p2.1),[§5\.4](https://arxiv.org/html/2607.01279#S5.SS4.p3.1)\.
- X\. Fu, K\. Zhang, X\. Chen, and Z\. Chen \(2023\)Report on national mental health development in china \(2021–2022\)\.Social Sciences Academic Press,pp\. 70–99\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p1.1)\.
- G\. Giannakakis, D\. Grigoriadis, K\. Giannakaki, O\. Simantiraki, A\. Roniotis, and M\. Tsiknakis \(2019\)Review on psychological stress detection using biosignals\.13\(1\),pp\. 440–460\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1),[§2](https://arxiv.org/html/2607.01279#S2.p4.1)\.
- S\. Grissmann, M\. Spüler, J\. Faller, T\. Krumpe, T\. O\. Zander, A\. Kelava, C\. Scharinger, and P\. Gerjets \(2017\)Context sensitivity of eeg\-based workload classification under different affective valence\.11\(2\),pp\. 327–334\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p3.1),[§5\.1](https://arxiv.org/html/2607.01279#S5.SS1.p2.1)\.
- C\. He, Z\. Guo, Y\. Xu, X\. Zhao, W\. Yang, X\. Wang, J\. Ding, M\. Ma, J\. Chen, S\. Zhou, and L\. Xia \(2026\)DICHA: dynamic and interpretable convolution via hierarchical attention for robust eeg classification\.120,pp\. 110091\.External Links:ISSN 1746\-8094,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.bspc.2026.110091),[Link](https://www.sciencedirect.com/science/article/pii/S1746809426006452)Cited by:[§6](https://arxiv.org/html/2607.01279#S6.SS0.SSS0.Px2.p1.1)\.
- C\. Hu, R\. Wang, X\. Song, T\. Zhou, X\. Wu, N\. Sebe, and Z\. Chen \(2025\)A correlation manifold self\-attention network for eeg decoding\.InProceedings of the Thirty\-Fourth International Joint Conference on Artificial Intelligence,pp\. 5372–5380\.Cited by:[Appendix D](https://arxiv.org/html/2607.01279#A4.p1.1),[§5\.1](https://arxiv.org/html/2607.01279#S5.SS1.p1.1),[§5](https://arxiv.org/html/2607.01279#S5.p1.1)\.
- V\. Jayaram, M\. Alamgir, Y\. Altun, B\. Scholkopf, and M\. Grosse\-Wentrup \(2016\)Transfer learning in brain\-computer interfaces\.11\(1\),pp\. 20–31\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p2.1)\.
- W\. Jiang, L\. Zhao, and B\. Lu \(2024\)Large brain model for learning generic representations with tremendous eeg data in bci\.Cited by:[Appendix D](https://arxiv.org/html/2607.01279#A4.p1.1),[§1](https://arxiv.org/html/2607.01279#S1.p3.1),[§2](https://arxiv.org/html/2607.01279#S2.p2.1),[§2](https://arxiv.org/html/2607.01279#S2.p4.1),[§5\.1](https://arxiv.org/html/2607.01279#S5.SS1.p1.1),[§5](https://arxiv.org/html/2607.01279#S5.p1.1)\.
- R\. Juster, B\. S\. McEwen, and S\. J\. Lupien \(2010\)Allostatic load biomarkers of chronic stress and impact on health and cognition\.35\(1\),pp\. 2–16\.Cited by:[§6](https://arxiv.org/html/2607.01279#S6.SS0.SSS0.Px2.p1.1)\.
- V\. J\. Lawhern, A\. J\. Solon, N\. R\. Waytowich, S\. M\. Gordon, C\. P\. Hung, and B\. J\. Lance \(2018\)EEGNet: a compact convolutional neural network for eeg\-based brain–computer interfaces\.15\(5\),pp\. 056013\.Cited by:[Appendix D](https://arxiv.org/html/2607.01279#A4.p1.1),[§5\.1](https://arxiv.org/html/2607.01279#S5.SS1.p1.1),[§5](https://arxiv.org/html/2607.01279#S5.p1.1)\.
- Y\. Li, A\. Lou, Z\. Xu, S\. Zhang, S\. Wang, D\. J\. Englot, S\. Kolouri, D\. Moyer, R\. G\. Bayrak, and C\. Chang \(2024\)Neurobolt: resting\-state eeg\-to\-fmri synthesis with multi\-dimensional feature mapping\.37,pp\. 23378–23405\.Cited by:[Appendix D](https://arxiv.org/html/2607.01279#A4.p1.1),[§1](https://arxiv.org/html/2607.01279#S1.p3.1),[§2](https://arxiv.org/html/2607.01279#S2.p2.1),[§2](https://arxiv.org/html/2607.01279#S2.p4.1),[§5\.1](https://arxiv.org/html/2607.01279#S5.SS1.p1.1),[§5](https://arxiv.org/html/2607.01279#S5.p1.1)\.
- F\. Lotte, L\. Bougrain, A\. Cichocki, M\. Clerc, M\. Congedo, A\. Rakotomamonjy, and F\. Yger \(2018\)A review of classification algorithms for eeg\-based brain–computer interfaces: a 10 year update\.15\(3\),pp\. 031005\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p2.1)\.
- R\. Lv, W\. Chang, G\. Yan, M\. T\. Sadiq, W\. Nie, and L\. Zheng \(2025\)Enhanced classification of motor imagery eeg signals using spatio\-temporal representations\.714,pp\. 122221\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1)\.
- S\. M\. Meier, L\. Petersen, M\. Mattheisen, O\. Mors, P\. B\. Mortensen, and T\. M\. Laursen \(2015\)Secondary depression in severe anxiety disorders: a population\-based cohort study in denmark\.The Lancet Psychiatry2\(6\),pp\. 515–523\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p1.1)\.
- A\. Mognon, J\. Jovicich, L\. Bruzzone, and M\. Buiatti \(2011\)ADJUST: an automatic eeg artifact detector based on the joint use of spatial and temporal features\.48\(2\),pp\. 229–240\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p2.1)\.
- Y\. Pan, J\. Chou, and C\. Wei \(2022\)MAtt: a manifold attention network for eeg decoding\.35,pp\. 31116–31129\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p4.1)\.
- Z\. Pei, H\. Wang, A\. Bezerianos, and J\. Li \(2020\)EEG\-based multiclass workload identification using feature fusion and selection\.70,pp\. 1–8\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1)\.
- Z\. Steel, C\. Marnane, C\. Iranpour, T\. Chey, J\. W\. Jackson, V\. Patel, and D\. Silove \(2014\)The global prevalence of common mental disorders: a systematic review and meta\-analysis 1980–2013\.International Journal of Epidemiology43\(2\),pp\. 476–493\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p1.1)\.
- Y\. M\. Ulrich\-Lai and J\. P\. Herman \(2009\)Neural regulation of endocrine and autonomic stress responses\.10\(6\),pp\. 397–409\.Cited by:[§5\.4](https://arxiv.org/html/2607.01279#S5.SS4.p3.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.30\.Cited by:[§4\.4](https://arxiv.org/html/2607.01279#S4.SS4.SSS0.Px4.p1.8)\.
- R\. Wang, C\. Hu, Z\. Chen, X\. Wu, and X\. Song \(2024\)A grassmannian manifold self\-attention network for signal classification\.InProceedings of the Thirty\-Third International Joint Conference on Artificial Intelligence,pp\. 5099–5107\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p2.1)\.
- H\. Wu, J\. Xu, X\. Lu, Q\. Liu, J\. Chen, and J\. Qi \(2025\)Analysing heart\-brain coupling: a correlation study of bcg, eeg, and ecg signals\.In2025 IEEE International Conference on Bioinformatics and Biomedicine \(BIBM\),pp\. 6133–6140\.Cited by:[§6](https://arxiv.org/html/2607.01279#S6.SS0.SSS0.Px2.p1.1)\.
- L\. Xia, Y\. Feng, Z\. Guo, J\. Ding, Y\. Li, Y\. Li, M\. Ma, G\. Gan, Y\. Xu, J\. Luo,et al\.\(2022\)MuLHiTA: a novel multiclass classification framework with multibranch lstm and hierarchical temporal attention for early detection of mental stress\.34\(12\),pp\. 9657–9670\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1),[§2](https://arxiv.org/html/2607.01279#S2.p4.1)\.
- L\. Xia, A\. S\. Malik, and A\. R\. Subhani \(2018\)A physiological signal\-based method for early mental\-stress detection\.46,pp\. 18–32\.External Links:ISSN 1746\-8094,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.bspc.2018.06.004),[Link](https://www.sciencedirect.com/science/article/pii/S1746809418301563)Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1)\.
- C\. Yang, M\. Westover, and J\. Sun \(2023\)BIOT: biosignal transformer for cross\-data learning in the wild\.36,pp\. 78240–78260\.Cited by:[Appendix D](https://arxiv.org/html/2607.01279#A4.p1.1),[§1](https://arxiv.org/html/2607.01279#S1.p3.1),[§2](https://arxiv.org/html/2607.01279#S2.p2.1),[§2](https://arxiv.org/html/2607.01279#S2.p4.1),[§5\.1](https://arxiv.org/html/2607.01279#S5.SS1.p1.1),[§5](https://arxiv.org/html/2607.01279#S5.p1.1)\.
- F\. Yger, M\. Berar, and F\. Lotte \(2016\)Riemannian approaches in brain\-computer interfaces: a review\.25\(10\),pp\. 1753–1762\.Cited by:[§1](https://arxiv.org/html/2607.01279#S1.p2.1),[§2](https://arxiv.org/html/2607.01279#S2.p3.1)\.
- M\. Zanetti, T\. Mizumoto, L\. Faes, A\. Fornaser, M\. De Cecco, L\. Maule, M\. Valente, and G\. Nollo \(2021\)Multilevel assessment of mental stress via network physiology paradigm using consumer wearable devices\.12\(4\),pp\. 4409–4418\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p2.1)\.
- H\. Zhang, C\. E\. Stevenson, T\. Jung, and L\. Ko \(2020\)Stress\-induced effects in resting eeg spectra predict the performance of ssvep\-based bci\.28\(8\),pp\. 1771–1780\.Cited by:[§4\.2](https://arxiv.org/html/2607.01279#S4.SS2.SSS0.Px5.p1.1),[§6](https://arxiv.org/html/2607.01279#S6.SS0.SSS0.Px1.p1.1)\.
- P\. Zhang, X\. Wang, W\. Zhang, and J\. Chen \(2018\)Learning spatial–spectral–temporal eeg features with recurrent 3d convolutional neural networks for cross\-task mental workload assessment\.27\(1\),pp\. 31–42\.Cited by:[§2](https://arxiv.org/html/2607.01279#S2.p1.1)\.

## Appendix

## Appendix AOverview

This appendix provides supplementary material organized as follows: Section[B](https://arxiv.org/html/2607.01279#A2)describes the datasets and cross\-subject protocol; Section[C](https://arxiv.org/html/2607.01279#A3)defines evaluation metrics; Section[D](https://arxiv.org/html/2607.01279#A4)details the baseline methods; Section[E](https://arxiv.org/html/2607.01279#A5)provides implementation details; Section[F](https://arxiv.org/html/2607.01279#A6)presents the geometric interpretation of frequency cluster aggregation; Sections[G](https://arxiv.org/html/2607.01279#A7)–[I](https://arxiv.org/html/2607.01279#A9)contain additional spectral and sensitivity analyses; and Section[J](https://arxiv.org/html/2607.01279#A10)discusses broader societal impacts\.

## Appendix BDataset Details

MIST Dataset\.This dataset contains EEG recordings from 30 healthy subjects performing mental arithmetic tasks under two conditions\. EEG signals were recorded using 64 channels at 1000 Hz \(downsampled to 200 Hz\)\. Each session consists of a 20\-minute mental arithmetic task subdivided into four difficulty levels\. The two sessions differ in experimental manipulation:MIST Controlimposes no time constraints or evaluative feedback, whileMIST Stressintroduces strict time limits and negative social\-evaluative feedback\. The two sessions are separated by≥7\\geq 7days to minimize learning effects\. We treat them as independent datasets due to the systematic differences in cognitive load and emotional stress\. Both use 4\-class stress labeling based on difficulty levels\.

SEED Dataset\.This dataset comprises EEG recordings from 15 subjects \(8 females, 7 males; mean age23\.27±2\.3723\.27\\pm 2\.37years\) during an emotion\-eliciting video\-watching task\. EEG signals were recorded from 62 electrodes at 1000 Hz and downsampled to 200 Hz\. Each subject watched 15 film clips \(5 positive, 5 neutral, 5 negative\), with 3\-class emotion labels\.

Cross\-subject protocol\.We adopt stratified 5\-fold cross\-validation \(CV\) at the subject level: subjects are partitioned into 5 disjoint folds preserving label proportions\. In each fold, 4 folds serve as training subjects and 1 fold as test subjects, with zero subject overlap\. Data is split 8:2 into training and test sets; the training set undergoes 5\-fold CV for model selection, followed by retraining on the full training set\.

## Appendix CEvaluation Metrics

We adopt five metrics:Balanced Accuracy \(B\.ACC\)averages per\-class recall to handle class imbalance, defined asB\.ACC=1G∑g=1GRecg\\text\{B\.ACC\}=\\frac\{1\}\{G\}\\sum\_\{g=1\}^\{G\}\\text\{Rec\}\_\{g\};Precision \(Pre\)measures the fraction of correct positive predictions;Recall \(Rec\)measures the fraction of actual positives correctly identified;F1\-Score \(F1\)is the harmonic mean of precision and recall; andAUCcomputes the area under the receiver operating characteristic curve\. All metrics are reported as percentages\.

## Appendix DBaseline Reference

We compare against five representative methods:EEGNet\[[19](https://arxiv.org/html/2607.01279#bib.bib24)\]is a compact CNN with depthwise and separable convolutions \(0\.05M params\);BIOT\[[34](https://arxiv.org/html/2607.01279#bib.bib6)\]is a biosignal transformer with channel embedding alignment for cross\-subject generalization \(3\.20M params\);LaBraM\[[17](https://arxiv.org/html/2607.01279#bib.bib7)\]is a large\-scale pretrained brain model with masked EEG modeling \(6\.23M params\);NeuroBOLT\[[20](https://arxiv.org/html/2607.01279#bib.bib11)\]synthesizes fMRI signals from raw EEG via multi\-dimensional representation learning \(10\.45M params\); andCorrAtt\[[15](https://arxiv.org/html/2607.01279#bib.bib25)\]employs correlation\-matrix self\-attention for EEG classification \(0\.22M params\)\. All models are trained from scratch under identical data splits and training configurations\.

## Appendix EImplementation Details

Preprocessing\.Resampling to 200 Hz, ICA artifact removal, 8th\-order Butterworth bandpass filter \(0\.5–50 Hz\), z\-score normalization, 8\-second non\-overlapping segmentation, and covariance computation\. The slice numbermmis set to 29 for MIST Control, 28 for MIST Stress, and 26 for SEED, determined via marginal effect optimization \(Eq\.[9](https://arxiv.org/html/2607.01279#S4.E9)\)\.

Training\.The model is optimized with Adam \(learning rate10−310^\{\-3\}, weight decay10−210^\{\-2\}\) for 300 epochs with batch normalization and dropout\.

Hardware and Software\.AMD Ryzen 9 3950X CPU, NVIDIA RTX 6000 GPU, 64 GB RAM\. Software: Ubuntu 20\.04, Python 3\.8, PyTorch 1\.12, MNE\-Python 0\.24\.

## Appendix FGeometric Interpretation of Frequency Cluster Aggregation

Since the tangent\-space features𝐡b,j,f\\mathbf\{h\}\_\{b,j,f\}are obtained via the Log\-Euclidean map from the SPD manifold, the Euclidean distance between any two tangent\-space vectors𝐡b,j,f1,𝐡b,j,f2\\mathbf\{h\}\_\{b,j,f\_\{1\}\},\\mathbf\{h\}\_\{b,j,f\_\{2\}\}equals the Log\-Euclidean distance between their corresponding SPD matrices on the manifold, i\.e\.,‖𝐡b,j,f1−𝐡b,j,f2‖2=dLE\(𝐑b,j,f1,𝐑b,j,f2\)\\\|\\mathbf\{h\}\_\{b,j,f\_\{1\}\}\-\\mathbf\{h\}\_\{b,j,f\_\{2\}\}\\\|\_\{2\}=d\_\{\\text\{LE\}\}\(\\mathbf\{R\}\_\{b,j,f\_\{1\}\},\\mathbf\{R\}\_\{b,j,f\_\{2\}\}\)\. Consequently, minimizing the K\-Means objective in tangent space is equivalent to grouping frequency points according to their Riemannian distance on𝒮\+\+C\\mathcal\{S\}\_\{\+\+\}^\{C\}\. Moreover, the cluster feature𝐡b,j,k\\mathbf\{h\}\_\{b,j,k\}\(Eq\.[5](https://arxiv.org/html/2607.01279#S4.E5)\) is exactly the tangent\-space representation of the Log\-Euclidean Fréchet mean of clusterkkon𝒮\+\+C\\mathcal\{S\}\_\{\+\+\}^\{C\}\. Define the Fréchet mean as𝐑¯k=exp⁡\(∑fAk,f⋅log⁡\(𝐑b,j,f\)\)∈𝒮\+\+C\\bar\{\\mathbf\{R\}\}\_\{k\}=\\exp\(\\sum\_\{f\}A\_\{k,f\}\\cdot\\log\(\\mathbf\{R\}\_\{b,j,f\}\)\)\\in\\mathcal\{S\}\_\{\+\+\}^\{C\}; then its tangent\-space feature satisfies:

vech⁡\(log⁡\(𝐑¯k\)\)=∑f=1FAk,f⋅𝐡b,j,f=𝐡b,j,k\\operatorname\{vech\}\\\!\\left\(\\log\(\\bar\{\\mathbf\{R\}\}\_\{k\}\)\\right\)=\\sum\_\{f=1\}^\{F\}A\_\{k,f\}\\cdot\\mathbf\{h\}\_\{b,j,f\}=\\mathbf\{h\}\_\{b,j,k\}\(16\)where the first equality follows fromlog⁡\(exp⁡\(⋅\)\)=⋅\\log\(\\exp\(\\cdot\)\)=\\cdotand the linearity ofvech⁡\(⋅\)\\operatorname\{vech\}\(\\cdot\), and the second from Eq\.[5](https://arxiv.org/html/2607.01279#S4.E5)\. This establishes that frequency cluster aggregation performs principled data\-driven grouping in Riemannian geometry rather than arbitrary dimensionality reduction, with each cluster feature precisely corresponding to the tangent\-space representation of the Log\-Euclidean Fréchet mean on𝒮\+\+C\\mathcal\{S\}\_\{\+\+\}^\{C\}\.

## Appendix GFrequency\-based Discriminability and Temporal Dynamics

This section investigates the impact of temporal window length on the stability of frequency\-domain representations and the discriminative power of EEG features\. Analysis is conducted across all datasets, each of which is sampled at 200 Hz\. Window lengths vary from 100 to 3200 samples \(0\.5 s to 16 s\), spanning durations from brief segments that capture fast transient dynamics to windows that encompass two full 8 s for MIST Stress arithmetic cycles—ensuring both robust estimation of low\-frequency activity and assessment of inter\-cycle coherence\.

#### Effect of Slice Length on Frequency Information Preservation\.

Spectral analysis in Figure[4](https://arxiv.org/html/2607.01279#A7.F4)\(x\-axis: frequency in Hz; y\-axis: power in dB\) reveals that EEG signals are dominated by low\-frequency components\. The green box highlights regions with strong low\-frequency power, which are more susceptible to being distorted by short slices\. Welch’s method and STFT are employed to effectively capture frequency information under different temporal conditions\. The former provides stable spectral estimates by averaging periodograms, while STFT retains temporal dynamics in non\-stationary signals through a time\-frequency representation\. These methods complement each other, ensuring robust feature extraction across varying slice lengths\. Short slice windows fail to retain low\-frequency energy \(e\.g\., unstable spectral peaks\), whereas I2RiMA’s intra\-/interslice modeling preserves both local dynamics and global coherence across varying temporal granularities\. This enables superior performance over BIOT on MIST/SEED datasets, addressing slicing\-induced temporal inconsistencies\.

![Refer to caption](https://arxiv.org/html/2607.01279v1/x1.png)Figure 4:Topographical Maps of Channel Discriminability Across Temporal Windows \(Welch Spectral Features\)\.

## Appendix HSpectral Discriminability Across various Channels\.

Figure[5](https://arxiv.org/html/2607.01279#A8.F5)visualizes the class\-wise discriminability of EEG channels under different window lengths using Welch power spectral features\. For each window size, we compute the average Euclidean distance between classes for each channel\. Warmer colors \(red\) indicate higher discriminability, while cooler colors \(blue\) denote lower relevance\. Green boxes highlight FC2 and C2 in the MIST Control paradigm, where extending the window from 100 to 3200 samples markedly increases inter\-class distance in frontal electrodes linked to mental arithmetic load\.

Purple boxes mark O1 and O2 in the MIST Stress paradigm, suggseting similar gains in occipital channels that likely reflect enhanced visual processing of error/time\-limit feedback under stress\. Orange boxes surround C5, C6, FT7 and FT8 in the SEED dataset, where longer windows length boosts separability in temporal electrodes associated with emotion processing\.

Across all datasets, larger window lengths yield richer low\-frequency estimates and amplify region\-specific discriminative patterns\. This demonstrates that appropriately extending slice duration is critical for early detection of mental stress and affective states, as it stabilizes low\-frequency features and captures cross\-cycle coherence in channels\.

![Refer to caption](https://arxiv.org/html/2607.01279v1/x2.png)Figure 5:Comparison across Datasets and Methods in Spectrum domain\.#### Intraslice and Interslice Information\.

To further quantify the effect of window length on temporal stability and spectral dynamics \(as visualized in Figure[6](https://arxiv.org/html/2607.01279#A8.F6)\), we introduce two metrics:*Intraslice Information*and*Interslice Information*\.

Intraslice Information measures the Euclidean distance between average spectral representations of different classes within the same temporal slice, reflecting inter\-class separability and intra\-class consistency; higher values indicate more distinguishable class\-specific patterns\.

Interslice Information captures local temporal dynamics by computing the mean spectral distance between adjacent slices of the same class; smaller values suggest more stable temporal evolution, while larger values indicate higher dynamism\. Formally, let𝐡¯c,j=1Nc∑i∈𝒮c𝐡i,j\\bar\{\\mathbf\{h\}\}\_\{c,j\}=\\frac\{1\}\{N\_\{c\}\}\\sum\_\{i\\in\\mathcal\{S\}\_\{c\}\}\\mathbf\{h\}\_\{i,j\}denote the mean spectral representation of classccin slicejj, whereNcN\_\{c\}is the number of samples of classcc,CCis the number of classes andmmis the number of slices\. The two metrics are defined as:

Iintra\(j\)\\displaystyle I\_\{\\text\{intra\}\}\(j\)=1C\(C−1\)∑c1=1C∑c2\>c1C‖𝐡¯c1,j−𝐡¯c2,j‖2\\displaystyle=\\frac\{1\}\{C\(C\-1\)\}\\sum\_\{c\_\{1\}=1\}^\{C\}\\sum\_\{c\_\{2\}\>c\_\{1\}\}^\{C\}\\\|\\bar\{\\mathbf\{h\}\}\_\{c\_\{1\},j\}\-\\bar\{\\mathbf\{h\}\}\_\{c\_\{2\},j\}\\\|\_\{2\}\(17\)Iinter\(c\)\\displaystyle I\_\{\\text\{inter\}\}\(c\)=1m−1∑j=1m−1‖𝐡¯c,j−𝐡¯c,j\+1‖2\\displaystyle=\\frac\{1\}\{m\-1\}\\sum\_\{j=1\}^\{m\-1\}\\\|\\bar\{\\mathbf\{h\}\}\_\{c,j\}\-\\bar\{\\mathbf\{h\}\}\_\{c,j\+1\}\\\|\_\{2\}\(18\)
As window size increases, Intraslice Information increases and Interslice Information decreases across all datasets, indicating that longer windows enhance class separability\.

![Refer to caption](https://arxiv.org/html/2607.01279v1/x3.png)Figure 6:Comparison across Datasets and Methods in Spectrum domain\.

## Appendix ISensitivity to slice number\.

The slice numbermmis a key hyperparameter controlling the granularity of temporal modeling\. We sweepm∈\{1,…,32\}m\\in\\\{1,\\ldots,32\\\}and quantify the computational efficiency of each setting via the marginal effect criterion defined in Eq\.[8](https://arxiv.org/html/2607.01279#S4.E8), which measures accuracy gain per unit of additional FLOPs\. Figure[7](https://arxiv.org/html/2607.01279#A9.F7)plots the marginal effect curves for all three datasets, where the shaded regions denote variance across cross\-validation folds\. All three curves exhibit a consistent pattern: marginal effect rises initially as additional slices provide richer temporal information, peaks at an intermediate range, then declines—and eventually turns negative—as redundant slices introduce noise without commensurate accuracy gains\.

Applying the constrained optimization formulation in Eq\.[9](https://arxiv.org/html/2607.01279#S4.E9), which maximizes marginal effect subject to minimum accuracy requirements, yieldsm∗=29m^\{\*\}=29for MIST Control,m∗=28m^\{\*\}=28for MIST Stress, andm∗=26m^\{\*\}=26for SEED, as annotated in Figure[7](https://arxiv.org/html/2607.01279#A9.F7)\. These optimal points reside near where each curve crosses zero from above, indicating the sweet spot before further slices degrade cost\-effectiveness\. Notably, MIST datasets tolerate largermmvalues than SEED, reflecting their longer task durations and richer temporal dynamics that benefit from finer slicing\.

![Refer to caption](https://arxiv.org/html/2607.01279v1/Figure_5.png)Figure 7:Marginal Effect Comparison across Three Datasets
## Appendix JBroader Impact

This work has both positive and negative societal implications\. On the positive side, I2RiMA enables objective, portable mental stress monitoring that could benefit early intervention for stress\-related disorders and support closed\-loop stress regulation systems, particularly in resource\-constrained settings where its low computational cost \(1\.60M parameters\) enables deployment on wearable devices\. On the negative side, EEG\-based stress detection could be misused for workplace surveillance or discriminatory screening without informed consent\. Cross\-subject generalization limitations may also lead to biased assessments for underrepresented populations not well\-represented in training data\. We strongly advocate that any real\-world deployment of I2RiMA requires informed consent, regulatory oversight, and fairness auditing across demographic groups\.
I\textsuperscript{2}RiMA: Spectral Riemannian Representation with Temporal Attention for Mental Stress Detection based on EEG Signals

Similar Articles

MSAIC-Net: A Multi-Scale Attention and Imbalance-Aware Contrastive Network for ECG-Based Myocardial Substrate Abnormality Detection

PRISM: Prioritized Channel Importance with Semi-supervised Domain Adaptation for Cross-Subject EEG Emotion Recognition

Efficient Temporal Modeling for Mobile Sleep Staging via Lightweight Random Attention

RECTOR: Masked Region-Channel-Temporal Modeling for Affective and Cognitive Representation Learning

ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction

Submit Feedback

Similar Articles

MSAIC-Net: A Multi-Scale Attention and Imbalance-Aware Contrastive Network for ECG-Based Myocardial Substrate Abnormality Detection
PRISM: Prioritized Channel Importance with Semi-supervised Domain Adaptation for Cross-Subject EEG Emotion Recognition
Efficient Temporal Modeling for Mobile Sleep Staging via Lightweight Random Attention
RECTOR: Masked Region-Channel-Temporal Modeling for Affective and Cognitive Representation Learning
ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction