Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models

arXiv cs.LG Papers

Summary

This paper identifies and explains a spectral bias in reconstruction-based EEG foundation models, where embeddings over-represent aperiodic and low-frequency components while under-representing oscillatory components, especially at higher frequencies, leading to poor performance in low-resource settings.

arXiv:2605.26434v1 Announce Type: new Abstract: EEG foundation models, pre-trained on large-scale unlabelled EEG data, have emerged as a promising direction towards learning generalizable EEG representations. Despite showing positive results in data-rich regimes, they often fail to outperform significantly smaller supervised models in low-resource settings compared to fully supervised models. We provide a mechanistic account of this shortcoming, attributing it to a fundamental mismatch between reconstruction-based pretext tasks and the idiosyncratic spectral structure of EEG signals, which decompose into distinct high-power aperiodic and low-power oscillatory components. Using controlled, synthetically-generated EEG inputs, we demonstrate that EEG foundation model embeddings are biased to capture the aperiodic components of the EEG signal while under-representing oscillatory components, particularly at higher frequencies. Additionally, linear probe evaluations on real-world BCI datasets further reveal that embeddings encode subject identity more strongly than task-relevant information, thereby reinforcing the low-frequency and aperiodic component bias in foundation model embeddings trained primarily on reconstruction based objectives. Together, these findings elucidate a failure mode in reconstruction based EEG foundation models and motivate future work to incorporate auxiliary losses explicitly targeting high-frequency oscillatory structure as a path toward more capable and generalizable EEG representations.
Original Article
View Cached Full Text

Cached at: 05/27/26, 09:10 AM

# Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models
Source: [https://arxiv.org/html/2605.26434](https://arxiv.org/html/2605.26434)
Aditya Kommineni1 akommine@usc\.edu &Emily Zhou2 emilyzho@usc\.edu &Kleanthis Avramidis2 avramidi@usc\.edu &Simon Bock Segaard3 ssegaa21@student\.aau\.dk &Jeppe Roden Münster3 jmunst21@student\.aau\.dk &Andreas Peter Juhl Hansen3 apjh21@student\.aau\.dk &Takfarinas Medani1 medani@usc\.edu &Tiantian Feng1 tiantiaf@usc\.edu &Richard Leahy1 leahy@usc\.edu &Shrikanth Narayanan1,2 shri@usc\.edu &Ming Hsieh Department of Electrical and Computer Engineering1 University of Southern California &Thomas Lord Department of Computer Science2 University of Southern California &Department of Mathematical Sciences3 Aalborg University

###### Abstract

EEG foundation models, pre\-trained on large\-scale unlabelled EEG data, have emerged as a promising direction towards learning generalizable EEG representations\. Despite showing positive results in data\-rich regimes, they often fail to outperform significantly smaller supervised models in low\-resource settings compared to fully supervised models\. We provide a mechanistic account of this shortcoming, attributing it to a fundamental mismatch between reconstruction\-based pretext tasks and the idiosyncratic spectral structure of EEG signals, which decompose into distinct high\-power aperiodic and low\-power oscillatory components\. Using controlled, synthetically\-generated EEG inputs, we demonstrate that EEG foundation model embeddings are biased to capture the aperiodic components of the EEG signal while under\-representing oscillatory components, particularly at higher frequencies\. Additionally, linear probe evaluations on real\-world BCI datasets further reveal that embeddings encode subject identity more strongly than task\-relevant information, thereby reinforcing the low\-frequency and aperiodic component bias in foundation model embeddings trained primarily on reconstruction based objectives\. Together, these findings elucidate a failure mode in reconstruction based EEG foundation models and motivate future work to incorporate auxiliary losses explicitly targeting high\-frequency oscillatory structure as a path toward more capable and generalizable EEG representations\.

## 1Introduction

Foundation models in language\(Brownet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib24); Achiamet al\.,[2023](https://arxiv.org/html/2605.26434#bib.bib25); Teamet al\.,[2023](https://arxiv.org/html/2605.26434#bib.bib26); Comaniciet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib27)\), speech\(Radfordet al\.,[2023](https://arxiv.org/html/2605.26434#bib.bib28)\)and vision\(Radfordet al\.,[2021](https://arxiv.org/html/2605.26434#bib.bib29)\)modalities have resulted in impressive advances in the respective domains over the past few years\. These foundation models have provided generalized representations that are able to provide noticeable performance improvements over their fully supervised counterparts in low\-resource settings, where large scale data collections are not feasible\. These developments have inspired numerous efforts to build generalized representations/foundation models for brain signals, especially Electroencephalography \(EEG\)\(Kostaset al\.,[2021](https://arxiv.org/html/2605.26434#bib.bib12); Wanget al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib16); Jianget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib15); Zhouet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib17); Liuet al\.,[2026a](https://arxiv.org/html/2605.26434#bib.bib20); Avramidiset al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib11); Kommineniet al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib9); Wanget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib14); Chienet al\.,[2022](https://arxiv.org/html/2605.26434#bib.bib10); Cuiet al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib45)\)\.

Despite being pre\-trained with large\-scale unlabelled EEG data, the performance improvements provided by EEG foundation models over fully supervised domain specific models remain limited\. While some prior works\(Wanget al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib16); Zhouet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib17)\)have hypothesized that these limitations could be alleviated through increased data scale and model size, recent works have not found strong effects with scaling model sizes in EEG foundation models\(Liuet al\.,[2026b](https://arxiv.org/html/2605.26434#bib.bib54)\)\. Additionally, we argue this view is fundamentally constrained by the data acquisition challenges inherent to EEG\. Unlike text, audio, and images, domains where foundation models benefit from web\-scale data, EEG recordings require specialized hardware, and trained personnel during acquisition, making large\-scale data collection extremely resource\-intensive and scaling laws difficult to realize in practice\. In this work, we attempt to provide an explanation from the pretext task perspective\. This view is further motivated by recent findings showing that EEG foundation models, when evaluated under resource\-constrained settings, whether through limitations on training samples or model parameters frequently underperform their fully supervised counterparts\(Yanget al\.,[2026](https://arxiv.org/html/2605.26434#bib.bib8); Liuet al\.,[2026b](https://arxiv.org/html/2605.26434#bib.bib54); Kuruppuet al\.,[2026](https://arxiv.org/html/2605.26434#bib.bib30)\), warranting deeper investigation into the representations learned by these models\. Identifying the factors that lead to this phenomenon could provide insights to inform improved EEG foundation model design and construction\. Current EEG foundation models borrow pre\-training objectives from multimedia modalities \(images, speech and text\), which have fundamental differences to biological brain signals like EEG\. In particular, there has been a predominant adoption of reconstruction based objectives as the primary means of training EEG foundation models\(Jianget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib15); Wanget al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib16); Zhouet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib17); Ouahidiet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib18)\)\. The present work will hence focus on this formulation\.

Furthermore, EEG signals have unique characteristics that can impact the observed behavior of current foundation models\. Notably, prior neuroscience work has consistently shown that spontaneous brain activity dominates measured signals in both energy and variance\(Raichle,[2006](https://arxiv.org/html/2605.26434#bib.bib53),[2010](https://arxiv.org/html/2605.26434#bib.bib52)\), with task\-evoked responses constituting relatively small perturbations\(Gibsonet al\.,[2022](https://arxiv.org/html/2605.26434#bib.bib22)\)\. In EEG, this is reflected in the prominence of scale\-free \(1/f\) aperiodic activity\(Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35)\), which has been linked to subject\-specific physiological properties such as age and cognitive states\(Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35)\)\. In this work, we hypothesize that the observed sub\-optimal task classification in linear probe experiments for EEG foundation models can be attributed to the combination of reconstruction based objectives and aforementioned spectral dominance of aperiodic components in EEG\.

Through controlled experiments on synthetically generated EEG signals and empirical results on real world EEG\-BCI tasks, we show that reconstruction based EEG foundation models exhibit a spectral bias towards encoding aperiodic and low\-frequency information\. This bias, consequently results in models representations forming subject\-centric clusters over task\-centric clusters as shown through empirical experiments on BCI datasets\. These observations motivate designing domain specific pretext tasks for EEG foundation models, that could include auxiliary losses to capture high\-frequency oscillatory components\. The contributions of this work are as formulated below:

- •Spectral bias in reconstruction\-based EEG foundation models\. We diagnose a key limitation of existing reconstruction based EEG foundation models\- internal representations of models preferentially encode information of aperiodic, low\-frequency components over high\-frequency oscillatory information\. We attribute this to a mismatch between standard reconstruction\-based self\-supervised objectives and the spectral structure of EEG signals\.
- •Synthetic validation via controlled EEG simulations\. Using a controlled single\-channel EEG simulation that independently varies aperiodic and oscillatory components, we show that model embeddings preferentially encode aperiodic structure\. Specifically, aperiodic components are linearly decodable, while oscillatory components are only weakly decodable primarily at low frequencies, demonstrating a bias in representations\.
- •Empirical evidence from BCI tasks\. Foundation model embeddings achieve significantly higher performance on subject identification than on task decoding\. Given that the evaluated BCI tasks rely predominantly on oscillatory activity, this gap supports our hypothesis that models exhibit a spectral bias toward low\-frequency, aperiodic features\.

## 2Related Work

### 2\.1EEG Foundation Models

Early attempts to build EEG foundation models were inspired from architectures and pretext tasks from image and speech domains such as contrastive predictive coding in BENDR\(Kostaset al\.,[2021](https://arxiv.org/html/2605.26434#bib.bib12)\), contrastive learning through augmentations of signals in BIOT\(Yanget al\.,[2023](https://arxiv.org/html/2605.26434#bib.bib13)\)and masked reconstruction based objectives in EEGPT\(Wanget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib14)\)\. Findings by\(Chienet al\.,[2022](https://arxiv.org/html/2605.26434#bib.bib10)\)show masked reconstruction objectives perform better than contrastive objectives on EEG signals and better stability in masked reconstruction training, reconstruction based models have become the predominant method for EEG foundation model training\. Within reconstruction based models, two distinct paths were explored, one with discrete neural tokenizer augmented with masked token modelling\(Jianget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib15); Avramidiset al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib11)\)while the second included masked reconstruction on raw EEG signals\(Wanget al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib16); Zhouet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib17); Ouahidiet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib18); Wanget al\.,[2026](https://arxiv.org/html/2605.26434#bib.bib43); Cuiet al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib45); Döneret al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib19); Chenet al\.,[2026](https://arxiv.org/html/2605.26434#bib.bib47)\)\.

### 2\.2Aperiodic and Oscillatory Components in EEG

![Refer to caption](https://arxiv.org/html/2605.26434v1/x1.png)Figure 1:Simulated and real\-world EEG experiments\(A\) Pipeline for obtaining embeddings from multi\-channel EEG signals\. Embeddings are extracted from the last layer of the encoder for three foundation models \(LaBraM, CBraMod and CSBrain\) \(B\) Pipeline for computing linear decodability through controlled synthetic single channel EEG generation \(In the figure, aperiodic exponentβ\\betais sampled between \[θm​i​n\\theta\_\{min\},θm​a​x\\theta\_\{max\}\] to create N samples\. Embeddings are extracted for samples followed by linear regression to compute linear decodability\. \(C\) Linear probing experiments on four real\-world EEG datasets for classification on subject identity and task\. Better subject identity results would indicate that models capture more subject specific information in the internal representations\.The spectral information in EEG signals is composed of aperiodic broadband signal and oscillatory information\(Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35); Brakeet al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib38)\)\. The aperiodic signals exhibit a 1/f power spectra and are scale free\. Additionally, the aperiodic components constitute a significant portion of the spontaneous electrical field potentials of EEG recordings\(Raichle,[2006](https://arxiv.org/html/2605.26434#bib.bib53),[2010](https://arxiv.org/html/2605.26434#bib.bib52)\)\. The aperiodic signals have been hypothesized to depend on global changes in the excitation/inhibition \(E/I\) balance\(Gaoet al\.,[2017](https://arxiv.org/html/2605.26434#bib.bib39)\)while oscillatory components are related to population asynchrony and facilitate the dynamic temporal and spatial organization of neuronal activity\(Donoghueet al\.,[2022](https://arxiv.org/html/2605.26434#bib.bib40); Voytek and Knight,[2015](https://arxiv.org/html/2605.26434#bib.bib41)\)\. The aperiodic components of EEG were found to vary according to age, cognitive states and task demands\(Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35)\), have been shown to act as potential biomarkers for neurological disorders such as Depression, ADHD and Parkinsons\(Paniet al\.,[2022](https://arxiv.org/html/2605.26434#bib.bib42)\)\. While prior works have established correlations between aperiodic components capturing subject specific signatures and oscillatory components primarily capturing task specific signatures, the relation between these observations and impact on EEG foundation models has been under\-explored\.

## 3Spectral Bias in Reconstruction Based EEG SSL Models

Masked reconstruction has been the primary pretext task for training SSL models on EEG signals, on patches of raw EEG data\(Wanget al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib16); Zhouet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib17)\)or through discrete tokenization\(Wanget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib14); Jianget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib15)\)\. While masked reconstruction based pretext tasks have proven to be effective at learning generalized representations in images and speech modalities, it is not inherently clear whether reconstruction only objectives would lead to generalized representations in EEG that consistently outperform supervised baselines\.

#### Problem Setup

Let𝐗∈ℝC×T\\mathbf\{X\}\\in\\mathbb\{R\}^\{C\\times T\}be an EEG epoch overCCchannels andTTtimesteps, vectorized as𝐱=vec​\(𝐗\)∈ℝC​T\\mathbf\{x\}=\\mathrm\{vec\}\(\\mathbf\{X\}\)\\in\\mathbb\{R\}^\{CT\}\. We model the signal as:

𝐱=\(𝐀⊗𝐈T\)​\(𝐳ap\+𝐳osc\)\+𝜺\\mathbf\{x\}=\(\\mathbf\{A\}\\otimes\\mathbf\{I\}\_\{T\}\)\(\\mathbf\{z\}\_\{\\text\{ap\}\}\+\\mathbf\{z\}\_\{\\text\{osc\}\}\)\+\\bm\{\\varepsilon\}\(1\)where𝐀∈ℝC×Ns\\mathbf\{A\}\\in\\mathbb\{R\}^\{C\\times N\_\{s\}\}is the leadfield \(forward model\) matrix that linearly mapsNsN\_\{s\}neural sources into theCCEEG sensor space,𝐳ap\\mathbf\{z\}\_\{\\text\{ap\}\}and𝐳osc\\mathbf\{z\}\_\{\\text\{osc\}\}are the vectorized aperiodic and oscillatory source signals respectively, and𝜺∼𝒩​\(𝟎,Σε\)\\bm\{\\varepsilon\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\Sigma\_\{\\varepsilon\}\)\.

Let𝐦∈\{0,1\}C​T\\mathbf\{m\}\\in\\\{0,1\\\}^\{CT\}be a binary masking vector where each entry is drawn independently, withmi=0m\_\{i\}=0indicating that indexiiis masked\. The masked observation is defined as𝐱~=𝐦⊙𝐱\\tilde\{\\mathbf\{x\}\}=\\mathbf\{m\}\\odot\\mathbf\{x\}, where⊙\\odotdenotes the Hadamard product\. Letfθ:ℝC​T→ℝC​Tf\_\{\\theta\}:\\mathbb\{R\}^\{CT\}\\to\\mathbb\{R\}^\{CT\}andgϕ:ℝC​T→ℝC​Tg\_\{\\phi\}:\\mathbb\{R\}^\{CT\}\\to\\mathbb\{R\}^\{CT\}be encoder and decoder modules respectively, both being overparameterized\. The reconstruction objective is:

ℒ​\(θ,ϕ\)=𝔼𝐱,𝐦​\[‖\(1−𝐦\)⊙\(𝐱−gϕ​\(fθ​\(𝐱~\)\)\)‖2\]\\mathcal\{L\}\(\\theta,\\phi\)=\\mathbb\{E\}\_\{\\mathbf\{x\},\\mathbf\{m\}\}\\bigl\[\\\|\(1\-\\mathbf\{m\}\)\\odot\(\\mathbf\{x\}\-g\_\{\\phi\}\(f\_\{\\theta\}\(\\tilde\{\\mathbf\{x\}\}\)\)\)\\\|^\{2\}\\bigr\]\(2\)where the loss is computed only over the masked indices\. Assuming𝐳ap\\mathbf\{z\}\_\{\\text\{ap\}\}and𝐳osc\\mathbf\{z\}\_\{\\text\{osc\}\}are mutually independent, the covariance matrix of𝐱\\mathbf\{x\}factorizes as:

Σ𝐱=\(𝐀⊗𝐈T\)​\(Σap\+Σosc\)​\(𝐀⊗𝐈T\)⊤\+Σε\\Sigma\_\{\\mathbf\{x\}\}=\(\\mathbf\{A\}\\otimes\\mathbf\{I\}\_\{T\}\)\\bigl\(\\Sigma\_\{\\text\{ap\}\}\+\\Sigma\_\{\\text\{osc\}\}\\bigr\)\(\\mathbf\{A\}\\otimes\\mathbf\{I\}\_\{T\}\)^\{\\top\}\+\\Sigma\_\{\\varepsilon\}\(3\)whereΣap=Cov​\(𝐳ap\)\\Sigma\_\{\\text\{ap\}\}=\\mathrm\{Cov\}\(\\mathbf\{z\}\_\{\\text\{ap\}\}\)andΣosc=Cov​\(𝐳osc\)\\Sigma\_\{\\text\{osc\}\}=\\mathrm\{Cov\}\(\\mathbf\{z\}\_\{\\text\{osc\}\}\)\. A well\-established empirical observation in EEG is that the aperiodic component dominates the signal power\(Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35)\), i\.e\.:

tr​\(Σap\)≫tr​\(Σosc\)\\mathrm\{tr\}\(\\Sigma\_\{\\text\{ap\}\}\)\\gg\\mathrm\{tr\}\(\\Sigma\_\{\\text\{osc\}\}\)\(4\)so thatΣ𝐱\\Sigma\_\{\\mathbf\{x\}\}is dominated by the aperiodic term\(𝐀⊗𝐈T\)​Σap​\(𝐀⊗𝐈T\)⊤\(\\mathbf\{A\}\\otimes\\mathbf\{I\}\_\{T\}\)\\Sigma\_\{\\text\{ap\}\}\(\\mathbf\{A\}\\otimes\\mathbf\{I\}\_\{T\}\)^\{\\top\}\. Furthermore, the aperiodic component is characterized by a1/fβ1/f^\{\\beta\}power spectral density\(Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35)\), concentrating its energy predominantly in lower frequencies\. In contrast, the oscillatory component𝐳osc\\mathbf\{z\}\_\{\\text\{osc\}\}carries task\-relevant neural information that is spatially localized and contributes lesser to signal variance\. Alongside the idiosyncratic characteristics of EEG, spectral bias of reconstruction\-based neural networks in learning lower\-frequency components at a faster rate than higher\-frequency ones\(Rahamanet al\.,[2019](https://arxiv.org/html/2605.26434#bib.bib23); Xuet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib36)\)has been identified in prior works\. This frequency prioritization has also been observed in masked reconstruction frameworks, where the model preferentially captures low\-frequency structure to minimize the dominant terms of the reconstruction loss\(Zhanget al\.,[2022](https://arxiv.org/html/2605.26434#bib.bib37)\)\. SinceΣ𝐱\\Sigma\_\{\\mathbf\{x\}\}is dominated by the low\-frequency aperiodic component, the gradients of model parameters will be overwhelmingly determined by the aperiodic variance\. The oscillatory signals, being spatially localized and of comparatively low power, could contribute only weakly to the reconstruction loss\.

#### Hypothesis

Masked reconstruction models trained on EEG will predominantly learn representations of low\-frequency aperiodic structure, and will fail to reliably capture the spatially localized, higher\-frequency oscillatory components that are most informative for BCI downstream tasks\.

In order to validate the stated hypothesis, we adopt a two\-fold approach\. In Section[5](https://arxiv.org/html/2605.26434#S5), through controlled synthetic EEG data generation \(as shown in Fig[1](https://arxiv.org/html/2605.26434#S2.F1)B\), the linear decodability of foundation models to both aperiodic and oscillatory variables is examined\. Following this, empirical results \(linear probe experiments as shown in Fig[1](https://arxiv.org/html/2605.26434#S2.F1)C\) on real world EEG BCI tasks are provided in Section[6](https://arxiv.org/html/2605.26434#S6)on representative BCI tasks for the model embeddings ability to decode both trait\-like \(subject\-specific information\) and state\-like \(task\-specific information\)\.

## 4Models

In order to analyze reconstruction based EEG foundation models on the synthetic and real\-world EEG datasets, we pick three representative models that have demonstrated robust downstream performance across different tasks, LaBraM\(Jianget al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib15)\), CBraMod\(Wanget al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib16)\)and CSBrain\(Zhouet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib17)\)\.CBraModis a foundation model trained on 25,000 hours pre\-training data that aims to model temporal and spatial characteristics through distinct \(criss\-cross\) attention mechanisms\. This model employs a patch\-based masked reconstruction scheme for pre\-training\.CSBrainan attention\-based foundation model for EEG decoding with novel cross\-scale spatiotemporal tokenization and structured sparse attention\. Pre\-trained using masked reconstruction objective\.LaBraMis an EEG foundation model with a pre\-training objective based on masked token prediction\. An underlying neural tokenizer is trained with large\-scale EEG data through patching the EEG time series into tokens\.

## 5Synthetic Single Channel EEG Analysis

### 5\.1Synthetic Data Generation

As stated earlier, EEG signals are composed of aperiodic \(1/f\) and oscillatory components\. In order to study whether EEG foundation model representations capture each of these components, we generate single\-channel EEG signals with controlled variances in both components\. The spectral components of the aperiodic and oscillatory components were modelled as:

Sa​p​\(f\)=10Aa​pfβ​;​So​s​c​\(f\)=10Ao​s​c​e−\(f−fo​s​c\)22​w2​;​S​\(f\)=Sa​p​\(f\)\+So​s​c​\(f\)S\_\{ap\}\(f\)=\\frac\{10^\{A\_\{ap\}\}\}\{f^\{\\beta\}\}\\text\{; \}S\_\{osc\}\(f\)=10^\{A\_\{osc\}\}e^\{\\frac\{\-\(f\-f\_\{osc\}\)^\{2\}\}\{2w^\{2\}\}\}\\text\{; \}S\(f\)=S\_\{ap\}\(f\)\+S\_\{osc\}\(f\)\(5\)WhereAa​pA\_\{ap\}is the aperiodic offset,β\\betais the aperiodic exponent,Ao​s​cA\_\{osc\}is the oscillatory component power above the aperiodic component,fo​s​cf\_\{osc\}is the frequency of the oscillatory component andwwcorresponds to the bandwidth of the oscillatory component\. In order to obtain the time series from the frequency component, a uniform random phase is added to the amplitude component followed by an Inverse Fourier Transform\. Fig\.[1](https://arxiv.org/html/2605.26434#S2.F1)B illustrates the signal generation pipeline\. To study the impact of the aperiodic and oscillatory components individually, other parameters was fixed and multiple samples were generated by varying the parameter of interest\. For each parameter, 1000 samples were generated through linear sweep of the value corresponding to parameter of interest\. Ranges along which each of the parameters were varied for sample generation is reported in Table\.[1](https://arxiv.org/html/2605.26434#S5.T1)\. All samples were generated at 200Hz sampling rate and the length of the signal was 5s\. Refer to Appendix[C](https://arxiv.org/html/2605.26434#A3)for details regarding sample generation pseudocode and example timeseries samples\.

𝜷\\bm\{\\beta\}𝑨ap\\bm\{A\_\{\\text\{ap\}\}\}𝒇osc\\bm\{f\_\{\\text\{osc\}\}\}

CBraMod

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_CZ_linear_decodability_ap_exponent.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_CZ_linear_decodability_ap_offset.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_CZ_linear_decodability_osc_freq.png)\(c\)
CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_CZ_linear_decodability_ap_exponent.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_CZ_linear_decodability_ap_offset.png)\(e\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_CZ_linear_decodability_osc_freq.png)\(f\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_CZ_linear_decodability_ap_exponent.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_CZ_linear_decodability_ap_offset.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_CZ_linear_decodability_osc_freq.png)\(i\)

Figure 2:Linear decodabilityR2R^\{2\}values for Cz channel across three foundation models \(CBraMod, CSBrain, LaBraM\) for Aperiodic Exponent \(β\\beta\), Aperiodic Offset \(AapA\_\{\\text\{ap\}\}\) and Oscillation frequency \(foscf\_\{\\text\{osc\}\}\)\. The linear decodability is computed through synthetic single channel EEG with the respective variables varied across the ranges mentioned in Table[1](https://arxiv.org/html/2605.26434#S5.T1)\. Model representations encode information of aperiodic components whereas the oscillatory frequency is not well captured\.Table 1:Range of parameter sweep for synthetic data generation
### 5\.2Linear Decodability of Aperiodic and Oscillatory Components

Using the generated synthetic samples, embeddings from the last layer of the encoder for three EEG foundation models \(LaBraM, CBraMod and CSBrain as shown in Fig\.[1](https://arxiv.org/html/2605.26434#S2.F1)\(A\)\) were extracted \(the generated EEG samples were passed as an input corresponding to a particular EEG channel, e\.g\. Cz, Fz, Pz, Oz\)\. These embeddings, along with the true variable values used to generate the corresponding signals, are used to train linear regression models using a 5\-fold nested cross\-validation approach\. The objective of the linear regression models is predict the value of the underlying variable \(aperiodic exponentβ\\beta, oscillation frequencyfo​s​cf\_\{osc\}\) using the embeddings as inputs\. TheR2R^\{2\}value corresponding to each linear regression model provides a measure of how well the underlying variable can be linearly decoded from the model embeddings\. Linear decodability plots for EEG foundation models and three components \(aperiodic exponent, aperiodic offset and oscillation frequency\) are plotted in Fig[2](https://arxiv.org/html/2605.26434#S5.F2)\.

10Hz20Hz30Hz40Hz50Hz

CBraMod

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_cbramod_classifier_CZ_linear_decodability_oscfreqpower_10.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_cbramod_classifier_CZ_linear_decodability_oscfreqpower_20.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_cbramod_classifier_CZ_linear_decodability_oscfreqpower_30.png)\(c\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_cbramod_classifier_CZ_linear_decodability_oscfreqpower_40.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_cbramod_classifier_CZ_linear_decodability_oscfreqpower_50.png)\(e\)
CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_CZ_linear_decodability_oscfreqpower_10.png)\(f\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_CZ_linear_decodability_oscfreqpower_20.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_CZ_linear_decodability_oscfreqpower_30.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_CZ_linear_decodability_oscfreqpower_40.png)\(i\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_CZ_linear_decodability_oscfreqpower_50.png)\(j\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_CZ_linear_decodability_oscfreqpower_10.png)\(k\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_CZ_linear_decodability_oscfreqpower_20.png)\(l\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_CZ_linear_decodability_oscfreqpower_30.png)\(m\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_CZ_linear_decodability_oscfreqpower_40.png)\(n\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_CZ_linear_decodability_oscfreqpower_50.png)\(o\)

Figure 3:Linear decodability comparison of Cz channel across oscillatory frequencies \(foscf\_\{\\text\{osc\}\}\) with varying power of the oscillation \(AoscA\_\{\\text\{osc\}\}\) for CSBrain, LaBraM and CBraMod models\. Model sensitivity to predict the oscillatory power value decreases as the oscillatory frequency increases\. This indicates that reconstruction based foundation models are less sensitive to modulation in higher frequencies\.#### EEG Foundation model embeddings decode aperiodic components

As seen from Fig\.[2](https://arxiv.org/html/2605.26434#S5.F2), across three foundation models, there is a consistent trend wherein theR2R^\{2\}value is high for aperiodic exponent \(β\\beta\) and offset \(Aa​pA\_\{ap\}\)\. This indicates that across different foundation models, reconstruction based objectives are able to provide embeddings that allow for linear decodability of aperiodic components of the signals\. However, with respect to the oscillation frequency, we see that the linear regression models are not able to provide robust predictions, indicating that the oscillation frequency is not linearly encoded in the embeddings\.

#### EEG Foundation models exhibit a low\-frequency oscillatory bias

While the foundation model embeddings did not provide linear decodability for oscillation frequency values, they provide robust decoding of oscillatory frequency power \(Ao​s​cA\_\{osc\}\) in low frequencies i\.e\. 10Hz as shown in Fig\.[3](https://arxiv.org/html/2605.26434#S5.F3), as the frequency of the oscillatory component increases, the model is not able to decode the corresponding power of the oscillatory component\. This could indicate that the embeddings are primarily encoding lower frequency higher\-power components, which are the primary components that would lead to lower reconstruction losses owing to the spectral characteristics of EEG signals\. This inability of foundation model embeddings to encode modulation of higher oscillatory component magnitude could help partially explain the sub\-optimal performance of EEG foundation models in linear probe setting on BCI tasks, which require encoding beta \(13\-30 Hz\) and gamma \(≥\\geq30 Hz\) frequencies\. While the results have been shown for channel Cz, results for other electrodes are consistent with the stated observations\. Results for other channels \(Fz, Pz, Oz\) have been included in Appendix[D](https://arxiv.org/html/2605.26434#A4)\.

## 6BCI Datasets Linear Probe Analysis

### 6\.1Datasets

#### BCIC\-IV 2A\(Brunneret al\.,[2008](https://arxiv.org/html/2605.26434#bib.bib49)\)

Motor Imagery Classification dataset with 9 participants across two sessions recorded on different days\. Data were collected using 22 EEG channels at 250Hz for four imagined motor movements \(left arm, right arm, both feet and tongue\)\. Trials are segmented into 4 second chunks and a single fold with training and evaluation fold according to recording days\.

#### Physionet\-MI\(Schalket al\.,[2004a](https://arxiv.org/html/2605.26434#bib.bib2),[b](https://arxiv.org/html/2605.26434#bib.bib48)\)

1500 1\-2 minute EEG recordings from 109 participants, for four real and imagined motor tasks \(open and close left fist, right fist, both fists and both feet\)\. EEG recordings consist 64 channels and sampled at 160Hz\. Each trial was segmented into 4\-second windows and 5\-fold experiments were performed with 20 randomly sampled subjects per fold\. For each subject, the first 8 blocks are considered in training and the last four blocks as evaluation set\.

#### Kaggle\-ERN\(Mattoutet al\.,[2014](https://arxiv.org/html/2605.26434#bib.bib50)\)

dataset includes EEG recordings from 26 participants who perform tasks using an online P300 speller interface, and is primarily used to study event\-related potentials related to erroneous responses\. The EEG data were collected using 56 EEG electrodes and were downsampled to 200 Hz\. The classification task is to detect when the selected item is not the intended\. The train set consists of first four blocks of all subjects and test comprises of the fifth block\.

#### Sleep EDF\(Kempet al\.,[2000](https://arxiv.org/html/2605.26434#bib.bib51)\)

contains 197 whole\-night PolySomnoGraphic sleep recordings with EEG, EOG and chin EMG recorded at 100Hz and annotated for sleep stages every 30s\. In this work, EEG electrodes Fpz\-Cz and Pz\-Oz are considered for analysis\. The sleep stages are annotated for Wake, REM, Movement, NREM\-1, NREM\-2, NREM\-3 and NREM\-4\. For comparability to prior studies, NREM\-3 and NREM\-4 have been combined into a single class yielding 5 classes\. Randomly samples 20 subjects in a 5\-fold cross validation manner is used for evaluation with the training set comprised of the first days data and the testing set containing the second days sleep data\.

### 6\.2Results

EEG foundation models classify subjects better than tasks in BCI datasetsThe impact of the identified aperiodic low\-frequency spectral bias on downstream tasks is demonstrated through linear probe experiments\. Prior works\(Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35)\)have observed aperiodic components to be correlated to subject identities whereas modulation in oscillatory components are primarily associated with tasks\. To study whether the spectral bias in foundation models leads to embeddings capturing subject identities rather than task specific constructs, two distinct linear probe models are trained, one to predict the task for a given trial and the second to predict the subject identity from trial\. Appendix[B](https://arxiv.org/html/2605.26434#A2)includes hyperparameters and linear probe training details\. We hypothesize that under linear probe setting, EEG foundation model embeddings preserve subject specific information resulting in better subject identification performance compared to task classification\.

Three foundation models \(LaBraM, CBraMod and CSBrain\) are tested on three BCI datasets \(BCIC IV 2A, Kaggle\-ERN and Physionet\-MI\) and one sleep classification dataset \(Sleep\-EDF\)\. Sleep\-EDF dataset was selected to demonstrate an exemplary task wherein the task\-related power is much larger in magnitude compared to BCI tasks and the frequencies of interest are in lower frequencies compared to BCI tasks\. During the dataset split, identical training and testing sets as stated in Sec[6\.1](https://arxiv.org/html/2605.26434#S6.SS1)were used for both task classification and subject classification\. In order to provide a normalized metric for subject identification and task classification, Cohen’s kappa metric is reported\.

As seen from the results in Table[2](https://arxiv.org/html/2605.26434#S6.T2), we can clearly observe that for all the three BCI tasks, the performance on subject identification is noticeably better compared to task performance\. This observation holds despite the number of classes in task classification being noticeably lower compared to subject identification\. This indicates that the EEG foundation models capture session and subject information to a larger extent compared to task specific information\. This in turn leads to sup\-optimal performance observed in linear\-probe experiments for between\-subjects setting as the models are unable to generalize to unseen subjects\(Yanget al\.,[2026](https://arxiv.org/html/2605.26434#bib.bib8); Kuruppuet al\.,[2026](https://arxiv.org/html/2605.26434#bib.bib30)\)\. Additionally, the observed discrepancy between subject identification and task classification is not solely owing to session variables as BCIC IV 2A dataset consists of training data and testing data from distinct days, indicating that the models are able to capture subject specific information over session variables\.

#### EEG foundation models form task\-centric representations in Sleep\-EDF

While foundation models show a subject centric bias in BCI tasks, results in Sleep\-EDF exhibit an opposite trend, wherein the models provide better task performance over subject identification\. Firstly, unlike BCI tasks, sleep EEG recordings have large low\-frequency task relevant components, which helps the foundation models better capture task specific information\. The observed differences between BCI datasets and Sleep\-EDF are further influenced by the EEG montage used\. BCI datasets employ high\-density montages \(22–64 channels\), enabling models to capture rich spatial patterns and cross\-channel covariance structures that are strongly subject\-specific\(Kumaret al\.,[2021](https://arxiv.org/html/2605.26434#bib.bib31)\)\. In contrast, Sleep\-EDF uses only two bipolar channels \(Fpz\-Cz, Pz\-Oz\), which substantially limits spatial resolution and suppresses global signal components through differential referencing\. As a result, subject\-specific spatial signatures are less pronounced, reducing the model’s ability to encode subject identity\. Additionally, recent works suggest that sleep stages could include differences in aperiodic exponent, that could further explain the improved task performance\(Ameenet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib34)\)\.

Table 2:Linear probe performance of EEG foundation models on task and subject classification for BCIC\-IV 2A, Sleep\-EDF, Kaggle\-ERN and Physionet\-MI datasets\. Cohens Kappa metric is reported as the evaluation metric\. For the BCI tasks, foundation models encode subject identity over task specific information\. For BCIC\-IV 2A and Kaggle\-ERN, since linear probe is run on single fold, standard deviations with five different random seeds are shown, whereas for Physionet\-MI and Sleep\-EDF five folds are used owing to larger number of subjects in datasets\.![Refer to caption](https://arxiv.org/html/2605.26434v1/x2.png)Figure 4:Euclidean distance for task and subject based clusters\. As can be seen in the plots, for all the BCI datasets common subject \(dCSd\_\{\\text\{CS\}\}\) clusters have much smaller average distance compared to the common task \(dCTd\_\{\\text\{CT\}\}\) indicating the embeddings forming tight clusters based on subject identity\.![Refer to caption](https://arxiv.org/html/2605.26434v1/x3.png)Figure 5:t\-SNE embeddings of pre\-trained LaBraM, CBraMod, and CSBrain models across the BCIC IV 2a and Sleep\-EDF datasets\.t\-SNE plots indicate that models generally learn representations that tend to cluster by subject identity rather than by task label\. Embeddings from a maximum of 15 subjects are shown for clarity\.
#### Representation cluster distances align with linear probe results

To analyze the geometric structure of learned embeddings, we define two distance\-based metrics over class centroids in embedding space\. Let𝐜s,t=1\|𝒟s,t\|​∑i∈𝒟s,t𝐳i\\mathbf\{c\}\_\{s,t\}=\\frac\{1\}\{\|\\mathcal\{D\}\_\{s,t\}\|\}\\sum\_\{i\\in\\mathcal\{D\}\_\{s,t\}\}\\mathbf\{z\}\_\{i\}denote the centroid of embeddings𝐳i∈ℝd\\mathbf\{z\}\_\{i\}\\in\\mathbb\{R\}^\{d\}for subjectssand tasktt\.Common Subject\(dCSd\_\{\\text\{CS\}\}\) andCommon Task\(dCTd\_\{\\text\{CT\}\}\) distances are then defined as:

dCS=1\|𝒮\|​∑s∈𝒮∑t1≠t2‖𝐜s,t1−𝐜s,t2‖2\|𝒯s\|​\(\|𝒯s\|−1\),dCT=1\|𝒯\|​∑t∈𝒯∑s1≠s2‖𝐜s1,t−𝐜s2,t‖2\|𝒮t\|​\(\|𝒮t\|−1\),d\_\{\\mathrm\{CS\}\}=\\frac\{1\}\{\|\\mathcal\{S\}\|\}\\sum\_\{s\\in\\mathcal\{S\}\}\\frac\{\\sum\_\{t\_\{1\}\\neq t\_\{2\}\}\\\|\\mathbf\{c\}\_\{s,t\_\{1\}\}\-\\mathbf\{c\}\_\{s,t\_\{2\}\}\\\|\_\{2\}\}\{\|\\mathcal\{T\}\_\{s\}\|\(\|\\mathcal\{T\}\_\{s\}\|\-1\)\},\\quad d\_\{\\mathrm\{CT\}\}=\\frac\{1\}\{\|\\mathcal\{T\}\|\}\\sum\_\{t\\in\\mathcal\{T\}\}\\frac\{\\sum\_\{s\_\{1\}\\neq s\_\{2\}\}\\\|\\mathbf\{c\}\_\{s\_\{1\},t\}\-\\mathbf\{c\}\_\{s\_\{2\},t\}\\\|\_\{2\}\}\{\|\\mathcal\{S\}\_\{t\}\|\(\|\\mathcal\{S\}\_\{t\}\|\-1\)\},\(6\)where𝒮\\mathcal\{S\},𝒯\\mathcal\{T\}are the sets of subjects and tasks, and𝒯s\\mathcal\{T\}\_\{s\},𝒮t\\mathcal\{S\}\_\{t\}denote tasks for subjectssand subjects for tasktt, respectively\. As shown in Fig\.[4](https://arxiv.org/html/2605.26434#S6.F4),dCS≪dCTd\_\{\\mathrm\{CS\}\}\\ll d\_\{\\mathrm\{CT\}\}for BCI tasks, indicating that embeddings cluster more tightly by subject than by task, suggesting subject\-specific characteristics dominate the representation space\. For Sleep\-EDF, the trend is reversed aligning with linear probe results, reflecting a weaker subject\-centric effect and stronger task\-related structure\. Further analysis on model representations through t\-SNE plots with for both subject and task labels indicate clusters primarily based on the subject identity rather the task in BCI datasets as seen in Fig\.[5](https://arxiv.org/html/2605.26434#S6.F5)\. This further reinforces the initial hypothesis of the model embeddings capturing low\-frequency and aperiodic components rather than oscillatory information\. Additional t\-SNE plots are provided in Appendix[E](https://arxiv.org/html/2605.26434#A5)\.

## 7Limitations and Future Work

While this work identifies aperiodic and low\-frequency spectral bias to be an impediment in EEG foundation models ability to provide generalized representations for robust downstream performance, mitigation strategies through auxiliary losses that enable encoding high\-frequency oscillatory information explicitly must be explored in future work\. Some directions that have been explored in the context of reconstruction models include augmenting knowledge\-guided objectives\(Kommineniet al\.,[2024](https://arxiv.org/html/2605.26434#bib.bib9); Wanget al\.,[2026](https://arxiv.org/html/2605.26434#bib.bib43)\)and distinct subspaces for subject\- and task\-specific representations\(Mishraet al\.,[2025](https://arxiv.org/html/2605.26434#bib.bib44)\)\. While the controlled synthetic EEG experiments help identify the sensitivity of representations to aperiodic and oscillatory component effects, the analysis is limited to stationary single\-channel signals\. Further analysis including multi\-channel non\-stationary signals could provide deeper insights into the representation topology of EEG foundation models\. The primary focus of this work was to characterize the spectral bias in reconstruction\-based EEG foundation models and examine its downstream impact on BCI performance\. While this bias negatively affects tasks that rely on high\-frequency neural signatures, it may conversely benefit tasks driven by low\-frequency, high\-amplitude components, as suggested by the improved performance observed on the sleep staging task in this work\. Additionally, our analysis is scoped to reconstruction\-based pre\-training objectives, which have seen increasing adoption in the EEG foundation model literature\. Alternative objectives such as contrastive or predictive approaches may exhibit different inductive biases, and understanding how these compare to the spectral bias identified here remains an important direction for future work\.

## 8Conclusion

Our work provides a mechanistic explanation for sub\-optimal linear probe results in EEG foundation models by identifying the spectral bias in foundation models towards aperiodic and low\-frequency components\. This spectral bias is attributed to reconstruction based objectives, wherein the large spectral power in the aperiodic components with 1/f spectral nature incentivizes the model to learn lower frequency components\. This bias towards aperiodic components is empirically demonstrated in foundation models through a controlled synthetic EEG samples wherein the representations of models are able to linearly decode aperiodic exponent and offset but not oscillatory information\. Further, we demonstrate that this spectral bias results in representations encoding subject specific information in real world EEG datasets, leading to much higher subject identification performance over task classification\. These observations motivate employing modified pretext tasks that explicitly model high\-frequency components in EEG\.

## References

- J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2023\)Gpt\-4 technical report\.arXiv preprint arXiv:2303\.08774\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1)\.
- Temporally resolved analyses of aperiodic features track neural dynamics during sleep\.Communications psychology3\(1\),pp\. 160\.Cited by:[§6\.2](https://arxiv.org/html/2605.26434#S6.SS2.SSS0.Px1.p1.1)\.
- K\. Avramidis, T\. Feng, W\. Jeong, J\. Lee, W\. Cui, R\. M\. Leahy, and S\. Narayanan \(2025\)Neural codecs as biosignal tokenizers\.arXiv preprint arXiv:2510\.09095\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- N\. Brake, F\. Duc, A\. Rokos, F\. Arseneau, S\. Shahiri, A\. Khadra, and G\. Plourde \(2024\)A neurophysiological basis for aperiodic eeg and the background spectral trend\.Nature communications15\(1\),pp\. 1514\.Cited by:[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1)\.
- T\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. D\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell, S\. Agarwal, A\. Herbert\-Voss, G\. Krueger, T\. Henighan, R\. Child, A\. Ramesh, D\. Ziegler, J\. Wu, C\. Winter, C\. Hesse, M\. Chen, E\. Sigler, M\. Litwin, S\. Gray, B\. Chess, J\. Clark, C\. Berner, S\. McCandlish, A\. Radford, I\. Sutskever, and D\. Amodei \(2020\)Language models are few\-shot learners\.pp\. 1877–1901\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1)\.
- C\. Brunner, R\. Leeb, G\. Müller\-Putz, A\. Schlögl, and G\. Pfurtscheller \(2008\)BCI competition 2008–graz data set a\.Institute for knowledge discovery \(laboratory of brain\-computer interfaces\), Graz University of Technology16\(1\-6\),pp\. 1\.Cited by:[§6\.1](https://arxiv.org/html/2605.26434#S6.SS1.SSS0.Px1)\.
- Z\. Chen, Y\. Zhang, Q\. Lan, T\. Liu, H\. Wang, Y\. Ding, Z\. Jia, R\. Chen, K\. Wang, and X\. Zhou \(2026\)Uni\-NTFM: a unified foundation model for EEG signal representation learning\.External Links:[Link](https://openreview.net/forum?id=oUMiuYHW21)Cited by:[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- H\. S\. Chien, H\. Goh, C\. M\. Sandino, and J\. Y\. Cheng \(2022\)MAEEG: masked auto\-encoder for EEG representation learning\.InNeurIPS 2022 Workshop on Learning from Time Series for Health,External Links:[Link](https://openreview.net/forum?id=kttuLV59ZuJ)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- G\. Comanici, E\. Bieber, M\. Schaekermann, I\. Pasupat, N\. Sachdeva, I\. Dhillon, M\. Blistein, O\. Ram, D\. Zhang, E\. Rosen,et al\.\(2025\)Gemini 2\.5: pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities\.arXiv preprint arXiv:2507\.06261\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1)\.
- W\. Cui, W\. Jeong, P\. Thölke, T\. Medani, K\. Jerbi, A\. A\. Joshi, and R\. M\. Leahy \(2024\)Neuro\-gpt: towards a foundation model for eeg\.pp\. 1–5\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- B\. Döner, T\. M\. Ingolfsson, L\. Benini, and Y\. Li \(2025\)LUNA: efficient and topology\-agnostic foundation model for EEG signal analysis\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=uazfjnFL0G)Cited by:[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- T\. Donoghue, M\. Haller, E\. J\. Peterson, P\. Varma, P\. Sebastian, R\. Gao, T\. Noto, A\. H\. Lara, J\. D\. Wallis, R\. T\. Knight,et al\.\(2020\)Parameterizing neural power spectra into periodic and aperiodic components\.Nature neuroscience23\(12\),pp\. 1655–1665\.Cited by:[Appendix C](https://arxiv.org/html/2605.26434#A3.p1.1),[§1](https://arxiv.org/html/2605.26434#S1.p3.1),[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1),[§3](https://arxiv.org/html/2605.26434#S3.SS0.SSS0.Px1.p2.12),[§3](https://arxiv.org/html/2605.26434#S3.SS0.SSS0.Px1.p2.17),[§6\.2](https://arxiv.org/html/2605.26434#S6.SS2.p1.1)\.
- T\. Donoghue, N\. Schaworonkow, and B\. Voytek \(2022\)Methodological considerations for studying neural oscillations\.European journal of neuroscience55\(11\-12\),pp\. 3502–3527\.Cited by:[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1)\.
- R\. Gao, E\. J\. Peterson, and B\. Voytek \(2017\)Inferring synaptic excitation/inhibition balance from field potentials\.Neuroimage158,pp\. 70–78\.Cited by:[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1)\.
- E\. Gibson, N\. J\. Lobaugh, S\. Joordens, and A\. R\. McIntosh \(2022\)EEG variability: task\-driven or subject\-driven signal of interest?\.NeuroImage252,pp\. 119034\.External Links:ISSN 1053\-8119,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.neuroimage.2022.119034),[Link](https://www.sciencedirect.com/science/article/pii/S105381192200163X)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p3.1)\.
- W\. Jiang, L\. Zhao, and B\. Lu \(2024\)Large brain model for learning generic representations with tremendous EEG data in BCI\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=QzTpTRVtrP)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§1](https://arxiv.org/html/2605.26434#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1),[§3](https://arxiv.org/html/2605.26434#S3.p1.1),[§4](https://arxiv.org/html/2605.26434#S4.p1.1)\.
- B\. Kemp, A\. H\. Zwinderman, B\. Tuk, H\. A\. Kamphuisen, and J\. J\. Oberye \(2000\)Analysis of a sleep\-dependent neuronal feedback loop: the slow\-wave microcontinuity of the eeg\.IEEE Transactions on Biomedical Engineering47\(9\),pp\. 1185–1194\.Cited by:[§6\.1](https://arxiv.org/html/2605.26434#S6.SS1.SSS0.Px4)\.
- A\. Kommineni, K\. Avramidis, R\. Leahy, and S\. Narayanan \(2024\)Knowledge\-guided eeg representation learning\.In2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society \(EMBC\),Vol\.,pp\. 1–6\.External Links:[Document](https://dx.doi.org/10.1109/EMBC53108.2024.10782310)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§7](https://arxiv.org/html/2605.26434#S7.p1.1)\.
- D\. Kostas, S\. Aroca\-Ouellette, and F\. Rudzicz \(2021\)BENDR: using transformers and a contrastive self\-supervised learning task to learn from massive amounts of eeg data\.Frontiers in Human Neuroscience15,pp\. 653659\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- M\. G\. Kumar, S\. Narayanan, M\. Sur, and H\. A\. Murthy \(2021\)Evidence of task\-independent person\-specific signatures in eeg using subspace techniques\.IEEE Transactions on Information Forensics and Security16\(\),pp\. 2856–2871\.External Links:[Document](https://dx.doi.org/10.1109/TIFS.2021.3067998)Cited by:[§6\.2](https://arxiv.org/html/2605.26434#S6.SS2.SSS0.Px1.p1.1)\.
- G\. Kuruppu, N\. Wagh, V\. Kremen, and Y\. Varatharajah \(2026\)EEG foundation models: a critical review of current progress and future directions\.Journal of neural engineering23\(2\),pp\. 021001\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p2.1),[§6\.2](https://arxiv.org/html/2605.26434#S6.SS2.p3.1)\.
- C\. Liu, Y\. Deng, T\. Liu, J\. Zhou, X\. Zhou, Z\. Jia, and Y\. Ding \(2026a\)ECHO: toward contextual seq2seq paradigms in large EEG models\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=ClLQ6cLkoR)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1)\.
- D\. Liu, Y\. Chen, Z\. Chen, Z\. Cui, Y\. Wen, J\. An, J\. Luo, and D\. Wu \(2026b\)EEG foundation models: progresses, benchmarking, and open problems\.arXiv preprint arXiv:2601\.17883\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p2.1)\.
- J\. Mattout, Manu, maucle, and W\. Kan \(2014\)BCI challenge @ ner 2015\.Note:[https://kaggle\.com/competitions/inria\-bci\-challenge](https://kaggle.com/competitions/inria-bci-challenge)KaggleCited by:[§6\.1](https://arxiv.org/html/2605.26434#S6.SS1.SSS0.Px3)\.
- A\. Mishra, A\. M\. Samin, A\. Etemad, and J\. Hashemi \(2025\)Subject representation learning from eeg using graph convolutional variational autoencoders\.pp\. 1–5\.Cited by:[§7](https://arxiv.org/html/2605.26434#S7.p1.1)\.
- Y\. E\. Ouahidi, J\. Lys, P\. Thölke, N\. Farrugia, B\. Pasdeloup, V\. Gripon, K\. Jerbi, and G\. Lioi \(2025\)REVE: a foundation model for EEG \- adapting to any setup with large\-scale pretraining on 25,000 subjects\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=ZeFMtRBy4Z)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- S\. M\. Pani, L\. Saba, and M\. Fraschini \(2022\)Clinical applications of eeg power spectra aperiodic component analysis: a mini\-review\.Clinical Neurophysiology143,pp\. 1–13\.Cited by:[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1)\.
- A\. Radford, J\. W\. Kim, C\. Hallacy, A\. Ramesh, G\. Goh, S\. Agarwal, G\. Sastry, A\. Askell, P\. Mishkin, J\. Clark, G\. Krueger, and I\. Sutskever \(2021\)Learning transferable visual models from natural language supervision\.pp\. 8748–8763\.External Links:[Link](https://proceedings.mlr.press/v139/radford21a.html)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1)\.
- A\. Radford, J\. W\. Kim, T\. Xu, G\. Brockman, C\. Mcleavey, and I\. Sutskever \(2023\)Robust speech recognition via large\-scale weak supervision\.pp\. 28492–28518\.External Links:[Link](https://proceedings.mlr.press/v202/radford23a.html)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1)\.
- N\. Rahaman, A\. Baratin, D\. Arpit, F\. Draxler, M\. Lin, F\. Hamprecht, Y\. Bengio, and A\. Courville \(2019\)On the spectral bias of neural networks\.InProceedings of the 36th International Conference on Machine LearningAdvances in Neural Information Processing SystemsProceedings of the 40th International Conference on Machine LearningProceedings of the 38th International Conference on Machine LearningICASSP 2025\-2025 IEEE International Conference on Acoustics, Speech and Signal Processing \(ICASSP\)2024 IEEE International Symposium on Biomedical Imaging \(ISBI\)The Thirty\-ninth Annual Conference on Neural Information Processing SystemsThe Fourteenth International Conference on Learning Representations,K\. Chaudhuri, R\. Salakhutdinov, H\. Larochelle, M\. Ranzato, R\. Hadsell, M\.F\. Balcan, H\. Lin, A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, J\. Scarlett, M\. Meila, and T\. Zhang \(Eds\.\),Proceedings of Machine Learning ResearchProceedings of Machine Learning ResearchProceedings of Machine Learning Research, Vol\.9733202139,pp\. 5301–5310\.External Links:[Link](https://proceedings.mlr.press/v97/rahaman19a.html)Cited by:[§3](https://arxiv.org/html/2605.26434#S3.SS0.SSS0.Px1.p2.17)\.
- M\. E\. Raichle \(2006\)The brain’s dark energy\.Science314\(5803\),pp\. 1249–1250\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p3.1),[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1)\.
- M\. E\. Raichle \(2010\)Two views of brain function\.Trends in cognitive sciences14\(4\),pp\. 180–190\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p3.1),[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1)\.
- G\. Schalk, D\.J\. McFarland, T\. Hinterberger, N\. Birbaumer, and J\.R\. Wolpaw \(2004a\)BCI2000: a general\-purpose brain\-computer interface \(bci\) system\.IEEE Transactions on Biomedical Engineering51\(6\),pp\. 1034–1043\.External Links:[Document](https://dx.doi.org/10.1109/TBME.2004.827072)Cited by:[§6\.1](https://arxiv.org/html/2605.26434#S6.SS1.SSS0.Px2)\.
- G\. Schalk, D\. J\. McFarland, T\. Hinterberger, N\. Birbaumer, and J\. R\. Wolpaw \(2004b\)BCI2000: a general\-purpose brain\-computer interface \(bci\) system\.IEEE Transactions on biomedical engineering51\(6\),pp\. 1034–1043\.Cited by:[§6\.1](https://arxiv.org/html/2605.26434#S6.SS1.SSS0.Px2)\.
- G\. Team, R\. Anil, S\. Borgeaud, J\. Alayrac, J\. Yu, R\. Soricut, J\. Schalkwyk, A\. M\. Dai, A\. Hauth, K\. Millican,et al\.\(2023\)Gemini: a family of highly capable multimodal models\.arXiv preprint arXiv:2312\.11805\.Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1)\.
- B\. Voytek and R\. T\. Knight \(2015\)Dynamic network communication as a unifying neural basis for cognition, development, aging, and disease\.Biological psychiatry77\(12\),pp\. 1089–1097\.Cited by:[§2\.2](https://arxiv.org/html/2605.26434#S2.SS2.p1.1)\.
- G\. Wang, W\. Liu, Y\. He, C\. Xu, L\. Ma, and H\. Li \(2024\)EEGPT: pretrained transformer for universal and reliable representation of EEG signals\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=lvS2b8CjG5)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1),[§3](https://arxiv.org/html/2605.26434#S3.p1.1)\.
- J\. Wang, S\. Zhao, Z\. Luo, Y\. Zhou, H\. Jiang, S\. Li, T\. Li, and G\. Pan \(2025\)CBramod: a criss\-cross brain foundation model for EEG decoding\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=NPNUHgHF2w)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§1](https://arxiv.org/html/2605.26434#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1),[§3](https://arxiv.org/html/2605.26434#S3.p1.1),[§4](https://arxiv.org/html/2605.26434#S4.p1.1)\.
- J\. Wang, S\. Zhao, Y\. Zhou, Y\. Kang, S\. Li, and G\. Pan \(2026\)DeeperBrain: a neuro\-grounded eeg foundation model towards universal bci\.arXiv preprint arXiv:2601\.06134\.Cited by:[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1),[§7](https://arxiv.org/html/2605.26434#S7.p1.1)\.
- Z\. J\. Xu, Y\. Zhang, T\. Luo, Y\. Xiao, and Z\. Ma \(2020\)Frequency principle: fourier analysis sheds light on deep neural networks\.Communications in Computational Physics28\(5\),pp\. 1746–1767\.Cited by:[§3](https://arxiv.org/html/2605.26434#S3.SS0.SSS0.Px1.p2.17)\.
- C\. Yang, M\. Westover, and J\. Sun \(2023\)BIOT: biosignal transformer for cross\-data learning in the wild\.InAdvances in Neural Information Processing Systems,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),Vol\.36,pp\. 78240–78260\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/f6b30f3e2dd9cb53bbf2024402d02295-Paper-Conference.pdf)Cited by:[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1)\.
- L\. Yang, Q\. Sun, A\. Li, and M\. M\. V\. Hulle \(2026\)Are EEG foundation models worth it? comparative evaluation with traditional decoders in diverse BCI tasks\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=5Xwm8e6vbh)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p2.1),[§6\.2](https://arxiv.org/html/2605.26434#S6.SS2.p3.1)\.
- Q\. Zhang, Y\. Wang, and Y\. Wang \(2022\)How mask matters: towards theoretical understandings of masked autoencoders\.Advances in Neural Information Processing Systems35,pp\. 27127–27139\.Cited by:[§3](https://arxiv.org/html/2605.26434#S3.SS0.SSS0.Px1.p2.17)\.
- Y\. Zhou, J\. Wu, Z\. Ren, Z\. Yao, W\. Lu, K\. Peng, Q\. Zheng, C\. Song, W\. Ouyang, and C\. Gou \(2025\)CSBrain: a cross\-scale spatiotemporal brain foundation model for EEG decoding\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=agcXjEHmyW)Cited by:[§1](https://arxiv.org/html/2605.26434#S1.p1.1),[§1](https://arxiv.org/html/2605.26434#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.26434#S2.SS1.p1.1),[§3](https://arxiv.org/html/2605.26434#S3.p1.1),[§4](https://arxiv.org/html/2605.26434#S4.p1.1)\.

## Appendix AGenerative AI Use Disclosure

Large Language Models were employed for refining the quality of writing in this manuscript\. However, all content generated through Large Language Models was verified by the authors and modified accordingly\. Large Language Models were used to generate code to run experiments and format tables after the generated code was verified by the authors for its veracity\.

## Appendix BTraining Details & Hyperparameters

All linear probe models were trained on a single NVIDIA RTXA6000 GPU\. For Sleep\-EDF learning rate 1e\-2 was used, whereas for the BCI datasets, CSBrain and CBramod use 1e\-3 whereas LaBraM uses 5e\-4\. For the linear probe experiments, models were initialized with pre\-trained checkpoints\. Embeddings were extracted from the last layer of the encoder\. Code for synthetic EEG generation and linear probe training has been uploaded to[https://anonymous\.4open\.science/r/spectralbiasaperiodic/](https://anonymous.4open.science/r/spectralbiasaperiodic/)\. Table[3](https://arxiv.org/html/2605.26434#A2.T3)reports the hyperparameters for the experiments\.

Table 3:Hyperparameters for linear probe experiments
## Appendix CSynthetic EEG Single Channel Examples

Pseudocode for single synthetic EEG sample generation is provided in Pseudocode[1](https://arxiv.org/html/2605.26434#alg1)and the sweep for a variable of interest is provided in Pesudocode[2](https://arxiv.org/html/2605.26434#alg2)\. FOOOF python package was used for generating the frequency spectrum of single channel EEG\[Donoghueet al\.,[2020](https://arxiv.org/html/2605.26434#bib.bib35)\]\. Fig\.[6](https://arxiv.org/html/2605.26434#A3.F6)illustrates the examples generated by sweep of exponent values from 1\.0 to 2\.0\.

Algorithm 1Pseudocode 1Synthetic Time Series Generation from Parameters0:Frequency range

\[fmin,fmax\]\[f\_\{\\min\},f\_\{\\max\}\], sampling rate

fsf\_\{s\}, duration

TT, aperiodic parameters

\(β,Aa​p\)\(\\beta,A\_\{ap\}\), optional oscillation information

fo​s​cf\_\{osc\},

Ao​s​cA\_\{osc\}
0:Time series

x∈ℝLx\\in\\mathbb\{R\}^\{L\}
1:

L←fs⋅TL\\leftarrow f\_\{s\}\\cdot T
2:

\(𝐟,𝐏\)←gen\_power\_spectrum​\(\[fmin,fmax\],\[β,Aa​p\],\[fo​s​c,Ao​s​c\]\)\(\\mathbf\{f\},\\mathbf\{P\}\)\\leftarrow\\texttt\{gen\\\_power\\\_spectrum\}\(\[f\_\{\\min\},f\_\{\\max\}\],\[\\beta,A\_\{ap\}\],\[f\_\{osc\},A\_\{osc\}\]\)
3:

𝐟F​F​T←rfftfreq​\(L,1/fs\)\\mathbf\{f\}\_\{FFT\}\\leftarrow\\texttt\{rfftfreq\}\(L,1/f\_\{s\}\)
4:

𝐏i​n​t​e​r​p←interp​\(𝐟F​F​T,𝐟,𝐏\)\\mathbf\{P\}\_\{interp\}\\leftarrow\\texttt\{interp\}\(\\mathbf\{f\}\_\{FFT\},\\mathbf\{f\},\\mathbf\{P\}\)
5:

ϕ∼𝒰​\(0,2​π\)\\bm\{\\phi\}\\sim\\mathcal\{U\}\(0,2\\pi\)
6:

𝐀←𝐏i​n​t​e​r​p\\mathbf\{A\}\\leftarrow\\sqrt\{\\mathbf\{P\}\_\{interp\}\}
7:

𝐒←𝐀⊙ei​ϕ\\mathbf\{S\}\\leftarrow\\mathbf\{A\}\\odot e^\{i\\bm\{\\phi\}\}
8:

x←irfft​\(𝐒,L\)x\\leftarrow\\texttt\{irfft\}\(\\mathbf\{S\},L\)
9:return

xx

Algorithm 2Pseudocode 2Parameter Sweep for Dataset Generation0:Parameter name

pp, range

\[θmin,θmax\]\[\\theta\_\{\\min\},\\theta\_\{\\max\}\], number of samples

NN, base parameters

Ψ\\Psi
0:Dataset

𝒟=\{x\(i\)\}i=1N\\mathcal\{D\}=\\\{x^\{\(i\)\}\\\}\_\{i=1\}^\{N\}, parameter values

𝜽\\bm\{\\theta\}
1:

𝜽←linspace​\(θmin,θmax,N\)\\bm\{\\theta\}\\leftarrow\\texttt\{linspace\}\(\\theta\_\{\\min\},\\theta\_\{\\max\},N\)
2:

𝒟←∅\\mathcal\{D\}\\leftarrow\\emptyset
3:for

i=1i=1to

NNdo

4:

Ψ​\[p\]←𝜽i\\Psi\[p\]\\leftarrow\\bm\{\\theta\}\_\{i\}
5:

x\(i\)←generate\_time\_series​\(Ψ\)x^\{\(i\)\}\\leftarrow\\texttt\{generate\\\_time\\\_series\}\(\\Psi\)
6:

𝒟←𝒟∪\{x\(i\)\}\\mathcal\{D\}\\leftarrow\\mathcal\{D\}\\cup\\\{x^\{\(i\)\}\\\}
7:endfor

8:return

𝒟,𝜽\\mathcal\{D\},\\bm\{\\theta\}

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/exponent_sweep_plot.png)Figure 6:\(Top\) Time series of generated EEG signal forβ∈\{1\.0,1\.5,2\.0\}\\beta\\in\\\{1\.0,1\.5,2\.0\\\}\(Bottom\) Frequency spectra plot for EEG signals when a sweep of exponent value from 1\.0 to 2\.0 is performed\.
## Appendix DLinear Decodability Results

Additional linear decodability plots for channels across different EEG montage lobes, Frontal \(Fz\) \[Fig[7](https://arxiv.org/html/2605.26434#A4.F7), Fig[8](https://arxiv.org/html/2605.26434#A4.F8)\], Parietal \(Pz\) \[Fig[9](https://arxiv.org/html/2605.26434#A4.F9), Fig[10](https://arxiv.org/html/2605.26434#A4.F10)\] and Occipital \(Oz\) \[Fig[11](https://arxiv.org/html/2605.26434#A4.F11), Fig[12](https://arxiv.org/html/2605.26434#A4.F12)\] channels are included\. The linear decodability values for CBraMod across different channels are identical owing to the convolution layers used to compute the channel embeddings in the model architecture\. As we perform the synthetic experiments on single channel inputs, CBraMod would provide identical embeddings as outputs\.

𝜷\\bm\{\\beta\}𝑨ap\\bm\{A\_\{\\text\{ap\}\}\}𝒇osc\\bm\{f\_\{\\text\{osc\}\}\}

CBraMod

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_FZ_linear_decodability_ap_exponent.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_FZ_linear_decodability_ap_offset.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_FZ_linear_decodability_osc_freq.png)\(c\)
CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_FZ_linear_decodability_ap_exponent.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_FZ_linear_decodability_ap_offset.png)\(e\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_FZ_linear_decodability_osc_freq.png)\(f\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_FZ_linear_decodability_ap_exponent.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_FZ_linear_decodability_ap_offset.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_FZ_linear_decodability_osc_freq.png)\(i\)

Figure 7:Linear decodability for Fz channel across three foundation models \(CBraMod, CSBrain, LaBraM\) for Aperiodic Exponent \(β\\beta\), Aperiodic Offset \(AapA\_\{\\text\{ap\}\}\) and Oscillation frequency \(foscf\_\{\\text\{osc\}\}\)\.10Hz20Hz30Hz40Hz50Hz

CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_FZ_linear_decodability_oscfreqpower_10.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_FZ_linear_decodability_oscfreqpower_20.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_FZ_linear_decodability_oscfreqpower_30.png)\(c\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_FZ_linear_decodability_oscfreqpower_40.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_FZ_linear_decodability_oscfreqpower_50.png)\(e\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_FZ_linear_decodability_oscfreqpower_10.png)\(f\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_FZ_linear_decodability_oscfreqpower_20.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_FZ_linear_decodability_oscfreqpower_30.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_FZ_linear_decodability_oscfreqpower_40.png)\(i\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_FZ_linear_decodability_oscfreqpower_50.png)\(j\)

Figure 8:Linear decodability comparison of Fz channel across oscillatory frequencies \(foscf\_\{\\text\{osc\}\}\) with varying power of the oscillation \(AoscA\_\{\\text\{osc\}\}\) for CSBrain and LaBraM model\.𝜷\\bm\{\\beta\}𝑨ap\\bm\{A\_\{\\text\{ap\}\}\}𝒇osc\\bm\{f\_\{\\text\{osc\}\}\}

CBraMod

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_PZ_linear_decodability_ap_exponent.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_PZ_linear_decodability_ap_offset.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_PZ_linear_decodability_osc_freq.png)\(c\)
CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_PZ_linear_decodability_ap_exponent.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_PZ_linear_decodability_ap_offset.png)\(e\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_PZ_linear_decodability_osc_freq.png)\(f\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_PZ_linear_decodability_ap_exponent.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_PZ_linear_decodability_ap_offset.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_PZ_linear_decodability_osc_freq.png)\(i\)

Figure 9:Linear decodability for Pz channel across three foundation models \(CBraMod, CSBrain, LaBraM\) for Aperiodic Exponent \(β\\beta\), Aperiodic Offset \(AapA\_\{\\text\{ap\}\}\) and Oscillation frequency \(foscf\_\{\\text\{osc\}\}\)\.10Hz20Hz30Hz40Hz50Hz

CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_PZ_linear_decodability_oscfreqpower_10.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_PZ_linear_decodability_oscfreqpower_20.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_PZ_linear_decodability_oscfreqpower_30.png)\(c\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_PZ_linear_decodability_oscfreqpower_40.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_PZ_linear_decodability_oscfreqpower_50.png)\(e\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_PZ_linear_decodability_oscfreqpower_10.png)\(f\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_PZ_linear_decodability_oscfreqpower_20.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_PZ_linear_decodability_oscfreqpower_30.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_PZ_linear_decodability_oscfreqpower_40.png)\(i\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_PZ_linear_decodability_oscfreqpower_50.png)\(j\)

Figure 10:Linear decodability comparison of Pz channel across oscillatory frequencies \(foscf\_\{\\text\{osc\}\}\) with varying power of the oscillation \(AoscA\_\{\\text\{osc\}\}\) for CSBrain and LaBraM model\.𝜷\\bm\{\\beta\}𝑨ap\\bm\{A\_\{\\text\{ap\}\}\}𝒇osc\\bm\{f\_\{\\text\{osc\}\}\}

CBraMod

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_OZ_linear_decodability_ap_exponent.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_OZ_linear_decodability_ap_offset.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_cbramod_classifier_OZ_linear_decodability_osc_freq.png)\(c\)
CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_OZ_linear_decodability_ap_exponent.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_OZ_linear_decodability_ap_offset.png)\(e\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_csbrain_OZ_linear_decodability_osc_freq.png)\(f\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_OZ_linear_decodability_ap_exponent.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_OZ_linear_decodability_ap_offset.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_offset_oscfreq/sim_dataset_labram_base_OZ_linear_decodability_osc_freq.png)\(i\)

Figure 11:Linear decodability for Oz channel across three foundation models \(CBraMod, CSBrain, LaBraM\) for Aperiodic Exponent \(β\\beta\), Aperiodic Offset \(AapA\_\{\\text\{ap\}\}\) and Oscillation frequency \(foscf\_\{\\text\{osc\}\}\)\.10Hz20Hz30Hz40Hz50Hz

CSBrain

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_OZ_linear_decodability_oscfreqpower_10.png)\(a\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_OZ_linear_decodability_oscfreqpower_20.png)\(b\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_OZ_linear_decodability_oscfreqpower_30.png)\(c\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_OZ_linear_decodability_oscfreqpower_40.png)\(d\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_csbrain_OZ_linear_decodability_oscfreqpower_50.png)\(e\)
LaBraM

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_OZ_linear_decodability_oscfreqpower_10.png)\(f\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_OZ_linear_decodability_oscfreqpower_20.png)\(g\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_OZ_linear_decodability_oscfreqpower_30.png)\(h\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_OZ_linear_decodability_oscfreqpower_40.png)\(i\)
![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/sim_plots_exp_oscfreqpower/sim_dataset_labram_base_OZ_linear_decodability_oscfreqpower_50.png)\(j\)

Figure 12:Linear decodability comparison of Oz channel across oscillatory frequencies \(foscf\_\{\\text\{osc\}\}\) with varying power of the oscillation \(AoscA\_\{\\text\{osc\}\}\) for CSBrain and LaBraM model\.
## Appendix EAdditional t\-SNE Plots

![Refer to caption](https://arxiv.org/html/2605.26434v1/figures/tsne_appendix.png)Figure 13:t\-SNE embeddings of pre\-trained LaBraM, CBraMod, and CSBrain models across the PhysioNet\-MI and Kaggle\-ERN datasets\.

Similar Articles

The Identity Trap in EEG Foundation Models: A Diagnostic Audit

arXiv cs.LG

This paper identifies and diagnoses the 'Identity Trap' in EEG foundation models, where high accuracy may stem from subject-identity features rather than genuine clinical biomarkers. It proposes FMScope, a frozen-representation protocol to disentangle these signals, and demonstrates that subject-identity confounding is universal across three models and removable with linear methods.

Assessing Region-Level EEG Contributions to Cognitive Workload Prediction

arXiv cs.LG

This paper presents a region-level evaluation framework for EEG-based cognitive workload prediction, showing that frontal electrode groups outperform full-scalp baselines by 15-20% in rank position while using fewer electrodes, supporting efficient workload monitoring systems.