The Identity Trap in EEG Foundation Models: A Diagnostic Audit

arXiv cs.LG 06/08/26, 04:00 AM Papers
Summary
This paper identifies and diagnoses the 'Identity Trap' in EEG foundation models, where high accuracy may stem from subject-identity features rather than genuine clinical biomarkers. It proposes FMScope, a frozen-representation protocol to disentangle these signals, and demonstrates that subject-identity confounding is universal across three models and removable with linear methods.
arXiv:2606.06647v1 Announce Type: new Abstract: Objective. EEG foundation models (FMs) report strong accuracy on clinical resting-state EEG. However, high accuracy under subject-disjoint cross-validation remains ambiguous: it can reflect a genuine clinical biomarker, or subject-identity features that correlate with the label. We name this the Identity Trap and ask whether it can be diagnosed at the representation level before fine-tuning. Approach. We propose FMScope, a frozen-representation protocol packaging five diagnostics: variance decomposition, subject-axis erasure, aperiodic 1/f ablation, layer-wise label probing, and within-subject direction consistency. We apply it to three pretrained FMs (LaBraM, CBraMod, REVE) across four datasets in a 2x2 layout: subject relation of label x presence of a consensus cross-subject EEG marker. Main results. (i) The Identity Trap is universal: frozen subject-variance is 13-89x a random null in 12/12 pairs, rising in all 12 under fine-tuning (+10 to +63 pp). This dominance is a removable linear axis: erasing it improves label decoding where the label varies within subject (+6 to +12 pp in primary cells; +4 to +27 pp across external cohorts). (ii) Aperiodic 1/f is one subject carrier: removing it drops the subject probe by 9-19 pp on LaBraM and CBraMod. REVE saturates subject identity without measurable aperiodic dependence. (iii) Fine-tuning amplifies label-variance only in cells with a literature-established cross-subject marker. Significance. The Identity Trap is a physically-grounded instance of shortcut learning: the preferred cue has a measurable physiological component, and subject-disjoint splitting alone cannot rule it out. FMScope separates gains reflecting a biological marker from those reflecting subject identity.
Original Article
View Cached Full Text
Cached at: 06/08/26, 09:17 AM
# The Identity Trap in EEG Foundation Models: A Diagnostic Audit
Source: [https://arxiv.org/html/2606.06647](https://arxiv.org/html/2606.06647)
Jun\-You Lin School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan Ying Choon Wu Swartz Center for Computational Neuroscience, University of California, San Diego, La Jolla, CA 92037, USATzyy\-Ping Jung Swartz Center for Computational Neuroscience, University of California, San Diego, La Jolla, CA 92037, USAORCID: 0000\-0002\-8377\-2166\.

###### Abstract

Objective\.EEG foundation models \(FMs\) report strong headline accuracy on clinical resting\-state EEG\. However, high accuracy under subject\-disjoint cross\-validation remains ambiguous: it can reflect a genuine clinical biomarker, or subject\-identity features that correlate with the label in this cohort\. We name this ambiguity the*Identity Trap*and ask whether it can be diagnosed at the representation level before fine\-tuning\.

Approach\.We proposeFMScope, a frozen\-representation pre\-flight protocol packaging five diagnostics: variance decomposition, subject\-axis erasure, aperiodic1/f1/fablation, layer\-wise label probing, and within\-subject direction consistency\. We apply it to three pretrained transformer FMs \(LaBraM, CBraMod, REVE\) across four public resting\-state datasets \(mental arithmetic, sleep deprivation, Alzheimer’s and frontotemporal dementia, trait stress\) in an*a priori*2×22\{\\times\}2layout: subject relation of label×\\timespresence of a consensus cross\-subject EEG marker\.

Main results\.\(i\) The Identity Trap is universal across all three FMs: the frozen subject\-variance fraction is 13–89×\\timesa random\-Gaussian null in 12 of 12 pairs, rising in all 12 under fine\-tuning \(\+10\+10to\+63\+63pp\)\. This dominance lies on a removable linear axis: erasing it significantly improves label decoding where the label varies within subject \(\+6\+6to\+12\+12pp in primary cells;\+4\+4to\+27\+27pp across four external consensus\-marker cohorts; one\-sided sign testp<10−3p<10^\{\-3\}\)\. \(ii\) Aperiodic1/f1/fis one identifiable subject carrier: removing it drops the subject probe by99–1919pp uniformly on LaBraM and CBraMod\. REVE saturates subject identity with no measurable aperiodic dependence and a nonlinearly\-decodable residual after linear erasure: the Identity Trap is universal, but its carrier and linear removability are model\-specific\. \(iii\) Fine\-tuning amplifies label\-variance only in cells with a literature\-established cross\-subject EEG marker \(\+0\.6\+0\.6to\+8\.4\+8\.4pp\)\. No\-consensus cells span zero, indicating no label signal for fine\-tuning to amplify\.

Significance\.The Identity Trap is a physically\-grounded instance of shortcut learning: the preferred cue has a measurable physiological component of the input, not a pure statistical artifact, and subject\-disjoint splitting alone cannot rule it out\.FMScopethus separates gains that reflect a biological marker from those that reflect subject identity\.

Keywords:EEG foundation models; subject\-identity confounding; shortcut learning; representation analysis; clinical biomarkers; resting\-state EEG\.

## 1Introduction

EEG foundation models \(FMs\) are pretrained on large unlabeled EEG corpora with self\-supervised masked\-modeling objectives, and have been proposed as a general substrate for clinical electroencephalography\(Jianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib1); Wang and others,[2025](https://arxiv.org/html/2606.06647#bib.bib2); El Ouahidiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib3)\)\. On downstream tasks with established within\-subject neural contrasts \(motor imagery, event\-related potentials, sleep staging\), reported performance is strong\(Xionget al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib8); Wuet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib9); Kastratiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib16)\), building on a decade of cross\-subject classification protocols\(Lotteet al\.,[2018](https://arxiv.org/html/2606.06647#bib.bib45)\)\. The picture fragments on small\-NNresting\-state EEG \(rsEEG\), the setting that matters most for psychiatric and neurodegenerative biomarkers\. On cohorts of comparable size and recording quality, FM performance varies widely across clinical labels\(Shenet al\.,[2026](https://arxiv.org/html/2606.06647#bib.bib7)\); what controls this variation has not been addressed at the representation level\. Consider the self\-reported chronic\-stress dataset ofKomarovet al\.\([2020](https://arxiv.org/html/2606.06647#bib.bib5)\)\. A prior FM evaluation on this dataset reports a peak balanced accuracy of0\.90470\.9047, using a fixed 80/10/10 train/val/test split in which the same subjects appear in different folds \(best of four data\-splitting seeds; the worst seed reaches0\.670\.67\)\(Wanget al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib4)\)\. On the same dataset, under subject\-disjoint cross\-validation, we observe0\.430\.43–0\.500\.50across three FMs and five classical baselines \(Sec\.[4\.1](https://arxiv.org/html/2606.06647#S4.SS1)\)\. Both numbers can be correct under their own protocols\. Neither tells us what the FM has actually learned about the label\.

A single accuracy number cannot resolve this ambiguity\. High balanced accuracy under subject\-disjoint cross\-validation is consistent with at least three readings: \(i\) the FM has captured a genuine cross\-subject EEG marker of the clinical label; \(ii\) the FM has captured stable physiological subject traits that happen to co\-vary with the label in this cohort; or \(iii\) the FM has captured an entanglement of the two that does not separate at the read\-out\. Existing benchmarks enumerate scores\(Shenet al\.,[2026](https://arxiv.org/html/2606.06647#bib.bib7); Xionget al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib8); Wuet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib9); Kastratiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib16)\); protocol critiques identify subject\-identity leakage under trial\-level cross\-validation as one inflation source\(Brookshire and others,[2024](https://arxiv.org/html/2606.06647#bib.bib15)\)\. The underlying tension is not EEG\-specific\. Resting\-state fMRI fingerprinting shows that stable between\-individual differences in brain activity are sufficient to identify subjects\(Finn and others,[2015](https://arxiv.org/html/2606.06647#bib.bib10)\), and that the connections that identify a subject and the connections that predict behavior occupy different functional systems of the connectome\(Mantwillet al\.,[2022](https://arxiv.org/html/2606.06647#bib.bib44)\)\. Together, these results suggest a recurring competition, across brain\-imaging modalities, between stable subject\-identifying structure and the task\-related signal that biomarkers depend on\. Both lines of work establish that the problem exists\. Neither tells us, for a specific cohort×\\timesFM pair, which of the three readings is operative at the representation level\.

We approach this question through one well\-characterized physiological component of the EEG spectrum: the aperiodic1/f1/fbackground\. The standard FOOOF decomposition separates the EEG power spectrum into two parts: a broadband1/fχ1/f^\{\\chi\}component \(the aperiodic background itself\) and narrow periodic peaks at canonical bands such as theta and alpha\(Donoghueet al\.,[2020](https://arxiv.org/html/2606.06647#bib.bib26)\)\. Periodic peaks carry transient, task\-related state information of the kind clinical biomarkers typically index\. The aperiodic background reflects more stable properties of the recording: cortical excitation–inhibition balance and vigilance state\(Gaoet al\.,[2017](https://arxiv.org/html/2606.06647#bib.bib27)\), and electrode\-level features that vary between individuals and persist across sessions\(Kopčanováet al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib31)\)\. In ordinary EEG analysis, researchers treat the aperiodic background as a per\-subject nuisance and remove it by fitting a parametric model before they analyze the periodic peaks\. FMs are pretrained without any subject\-aware objective, so it is unclear whether their representations remove the aperiodic component in the same way, or instead keep it as an axis that encodes subject identity\. This raises a specific, correlational question: do EEG FM representations on small\-NNrsEEG*co\-encode*the aperiodic1/f1/fbackground with subject identity along the same representational directions? We can test it on LaBraM and CBraMod; on REVE the test is inconclusive, because REVE differs from the other two FMs along five design axes at once \(Sec\.[5\.5](https://arxiv.org/html/2606.06647#S5.SS5)\)\. Independently of this carrier question, fine\-tuning shows a cell\-conditional pattern: it amplifies subject\-related variance in every cell, but amplifies label\-related variance only in cells where the literature has already established a cross\-subject neural marker\.

We test this hypothesis on four public small\-NNclinical rsEEG datasets chosen*a priori*to populate a2×22\{\\times\}2sampling layout \(subject relation of the label×\\timespresence of a consensus cross\-subject EEG marker\) across three pretrained transformer FMs \(LaBraM, CBraMod, REVE\)\. We package the diagnostics required for the test asFMScope\(Fig\.[1](https://arxiv.org/html/2606.06647#S1.F1)\), a frozen\-representation framework with explicit scope conditions per tool\. We make four contributions\.

First, an empirical finding we term the*Identity Trap*: across 12 \(cell×\\timesFM\) frozen pairs the subject\-variance fraction is 13–89×\\timesa matched random\-Gaussian null; under fine\-tuning, subject\-variance fraction rises in all 12 pairs by\+10\+10to\+63\+63percentage points \(pp\)\. This dominance is confined to a removable linear axis: closed\-form subject\-axis erasure drives a linear subject probe to chance in all 12 pairs, and where the label varies within subject, erasing identity significantly improves label decoding \(\+6\+6to\+12\+12pp in the primary cells;\+4\+4to\+27\+27pp across four external consensus\-marker cohorts; one\-sided sign testp<10−3p<10^\{\-3\}\)\.

Second, a representational correlate of the1/f1/f–subject co\-encoding hypothesis: removing the aperiodic1/f1/fcomponent drops a linear subject probe by99to1919pp uniformly across all four cells on LaBraM and CBraMod\. REVE shows no measurable aperiodic dependence; the LaBraM\-and\-CBraMod group differs from REVE along five concurrent design axes \(Sec\.[5\.5](https://arxiv.org/html/2606.06647#S5.SS5)\), so we report this two\-versus\-one pattern descriptively rather than as a mechanism claim\.

Third, a cell\-conditional outcome map: fine\-tuning amplifies label\-variance only in cells with a consensus cross\-subject EEG marker \(Mann–WhitneyUU, one\-sidedp=0\.0022p=0\.0022,n=12n=12\), and the layer\-wise label probe descends monotonically toward chance in the no\-consensus trait cell on all three FMs\.

Fourth, theFMScopediagnostic framework itself, including its per\-tool scope conditions, plus the clinical/protocol guidelines that follow from the three findings above: recording trait\-cell labels as within\-subject contrasts where the state allows, and seeking external physiological validation where it does not; verifying that within\-subject classifier directions agree across subjects before any BCI calibration claim; and a frozen\-feature pre\-flight that returns a per\-cell verdict \(Tab\.[4](https://arxiv.org/html/2606.06647#S5.T4)\) before any fine\-tuning compute is spent\.

![Refer to caption](https://arxiv.org/html/2606.06647v1/x1.png)Figure 1:FMScopeoverview\.Five frozen\-representation diagnostics applied to embeddings from a pretrained transformer EEG\-FM\. Two of the five establish the Identity Trap: variance decomposition and subject\-axis erasure \(LEACE\)\. The other three characterize its origin and structure: aperiodic input ablation, layer\-wise subject/label probe, and within\-subject direction consistency\. Center: subject identity forms the dominant axis of the frozen representation and the clinical label a weaker one; the inset shows closed\-form removal of the linear subject components \(LEACE\)\. Colors and shapes in the embedding feature space schematically represent variations contributed by cognitive labels and individual subjects, respectively\. Per\-tool scope conditions and details in Sec\.[3](https://arxiv.org/html/2606.06647#S3); results in Sec\.[4](https://arxiv.org/html/2606.06647#S4)\.
## 2Related Work

##### EEG foundation models\.

We anchor on three open\-weight backbones\.LaBraM\(Jianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib1)\)feeds raw EEG patches through a temporal CNN into a transformer encoder\. The pretraining target is a discrete vector\-quantized code for each patch; a separate decoder maps each code back to the patch’s Fourier amplitude and phase, so the encoder must learn features that support Fourier reconstruction\. It is pretrained on∼2,500\\sim\\\!2\{,\}500hours of mixed EEG\.CBraMod\(Wang and others,[2025](https://arxiv.org/html/2606.06647#bib.bib2)\)uses a criss\-cross transformer that factorizes spatial and temporal attention into two parallel mechanisms\. Its patch embedding adds two branches: a temporal CNN and an FFT\-derived energy vector\. Pretraining reconstructs raw EEG patches under an MSE loss\. It is pretrained on a cleaned∼9,000\\sim\\\!9\{,\}000\-hour subset of the Temple University EEG Corpus \(TUEG total:∼15,000\\sim\\\!15\{,\}000subjects,∼27,000\\sim\\\!27\{,\}000hours\)\.REVE\(El Ouahidiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib3)\)is a spatio\-temporal transformer that uses 4D Fourier sinusoidal positional encoding and a linear patch embedding on the raw signal\. Pretraining reconstructs the raw EEG under an L1 loss, with an additional attention\-pooling secondary task\. It is pretrained on∼60,000\\sim\\\!60\{,\}000hours of EEG from 92 datasets covering25,00025\{,\}000subjects\. All three use masked\-modeling self\-supervision but differ along at least five design axes \(target representation, patch embedding, reconstruction loss, positional encoding, pretraining corpus diversity\), and we therefore report cross\-FM contrasts descriptively rather than as mechanism claims\. All three report strong downstream performance on event\-related EEG benchmarks\(Xionget al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib8); Wuet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib9)\), but none has been characterized at the representation level on small\-NNclinical resting\-state cohorts\.

##### Subject leakage and evaluation protocols\.

Prior critiques have established that trial\-level cross\-validation in clinical EEG inflates accuracy through subject leakage at the train–test boundary\(Brookshire and others,[2024](https://arxiv.org/html/2606.06647#bib.bib15)\), and recent benchmarks\(Xionget al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib8); Wuet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib9); Kastratiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib16); Shenet al\.,[2026](https://arxiv.org/html/2606.06647#bib.bib7)\)broadly adopt subject\-disjoint splitting \(with task\-specific exceptions, e\.g\. subject\-dependent splits retained for emotion recognition in EEG\-FM\-Bench\)\. These works document the inflation; they do not characterize what the FM has learned in place of the leaked subject signal once subject\-disjoint splitting is enforced\. Our starting point is the ambiguity that remains even after subject\-disjoint splitting: accuracy could still reflect subject\-correlated features that happen to co\-vary with the clinical label in a particular cohort\. We ask, at the representation level, what those features actually are\. A concurrent study\(Tanget al\.,[2026](https://arxiv.org/html/2606.06647#bib.bib49)\)asks the same of EEG\-FMs and also uses LEACE\-style erasure as a diagnostic, but applied to a lexicon of hand\-crafted neuro\-features \(band power, connectivity, entropy\)\. It reports that removing those features degrades decoding and does not examine the subject\-identity axis\. We erase the subject axis instead and find that its removal can improve consensus\-marker decoding, the complementary direction\.

##### Prior cross\-subject EEG markers for clinical labels\.

Prior research documents resting\-state or task\-related EEG features that have been used as cross\-subject biomarkers\. For mental arithmetic, a substantial corpus links frontal\-midline theta and parieto\-occipital alpha modulation to cognitive task load \(including mental arithmetic\)\(Klimesch,[1999](https://arxiv.org/html/2606.06647#bib.bib12); Gevinset al\.,[1997](https://arxiv.org/html/2606.06647#bib.bib29); Klimesch,[2012](https://arxiv.org/html/2606.06647#bib.bib13)\)\. For Alzheimer’s and frontotemporal dementia, an expert\-panel consensus\(Babiloniet al\.,[2021](https://arxiv.org/html/2606.06647#bib.bib30)\)cites posterior alpha peak\-frequency and power reduction as a candidate clinical biomarker; a recent two\-cohort replication\(Kopčanováet al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib31)\)reports that the AD\-versus\-HC spectral signature is “purely oscillatory” and that aperiodic features do not differ between groups\. The aperiodic1/f1/fslope therefore remains a contested sub\-component for AD\. Comparable expert\-consensus markers do not exist for our other two labels: frontal alpha asymmetry as a stress or depression marker remains contested\(Reznik and Allen,[2018](https://arxiv.org/html/2606.06647#bib.bib32); van der Vinneet al\.,[2017](https://arxiv.org/html/2606.06647#bib.bib33)\), and the sleep\-deprivation dataset paper\(Xianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib28)\)tabulates individual differences in sleep quality and traits alongside the rested\-versus\-deprived recordings without identifying a cross\-subject EEG marker\. This asymmetry across the four labels is what motivates our*a priori*sampling layout \(Sec\.[3\.1](https://arxiv.org/html/2606.06647#S3.SS1)\)\. None of these works ask whether a pretrained FM’s representation has actually learned the marker; that is the question we take up in Sec\.[4\.3](https://arxiv.org/html/2606.06647#S4.SS3)via aperiodic ablation of the input\.

##### Aperiodic1/f1/fbackground as a subject feature in EEG\.

The aperiodic1/f1/fslope and offset are parametrized by FOOOF\(Donoghueet al\.,[2020](https://arxiv.org/html/2606.06647#bib.bib26)\)\.Demuru and Fraschini \([2020](https://arxiv.org/html/2606.06647#bib.bib37)\)demonstrate that these aperiodic features are stable within an individual across sessions and distinguish subjects independently of task\-evoked oscillations, in parallel to functional\-connectivity fingerprinting on fMRI\(Finn and others,[2015](https://arxiv.org/html/2606.06647#bib.bib10)\)and earlier EEG biometric work\(Campisi and La Rocca,[2014](https://arxiv.org/html/2606.06647#bib.bib19); Marcel and Millán,[2007](https://arxiv.org/html/2606.06647#bib.bib20)\)\.Demuru and Fraschini \([2020](https://arxiv.org/html/2606.06647#bib.bib37)\)specifically report that handcrafted aperiodic features identify subjects with higher accuracy than canonical band\-power features, and remain consistent across eyes\-open and eyes\-closed conditions\. This makes the aperiodic1/f1/fone identifiable subject\-specific carrier in classical EEG features\. The fMRI fingerprinting work cited above documents the same between\-subject stability for that modality, without specifying which spectral component carries it\. A follow\-on connectome analysis further reports that the edges discriminating individuals show no single\-edge overlap with the edges predicting behavior across cognitive, language, and motor variables, indicating that subject\-identifying and label\-relevant signals can dissociate at the network\-feature level\(Mantwillet al\.,[2022](https://arxiv.org/html/2606.06647#bib.bib44)\)\. To our knowledge, no prior work has asked which spectral component carries that subject identity inside a pretrained EEG\-FM, nor whether it survives self\-supervised pretraining on cross\-subject corpora\. Sec\.[4\.3](https://arxiv.org/html/2606.06647#S4.SS3)addresses this question via FOOOF\-aperiodic input ablation as a correlational intervention on the frozen representation\.

##### Mechanism hypotheses for FM representation bias\.

Three lines of prior ML work motivate, but do not establish, the mechanisms by which a self\-supervised FM might preferentially encode stable physiological subject features over transient task\-related features\.Geirhoset al\.\([2020](https://arxiv.org/html/2606.06647#bib.bib35)\)document that neural networks tend to exploit dataset\-stable cues that correlate with labels \(*shortcut features*\) over invariant discriminative cues\. Under an MSE \(squared\-error\) reconstruction loss, masked\-modeling objectives put most of their loss on the high\-variance components of the input, and the model is widely conjectured to spend proportionally more capacity on those same components\. In EEG, the broadband aperiodic1/f1/fbaseline carries most of the low\-frequency spectral variance by construction\.El Ouahidiet al\.\([2025](https://arxiv.org/html/2606.06647#bib.bib3)\)explicitly motivate their L1 \(rather than MSE\) reconstruction loss as a counter to the L2 sensitivity to noise and outliers in EEG\. This is one concrete design choice on which our three FMs differ\. Linear\-network analyses\(Saxeet al\.,[2019](https://arxiv.org/html/2606.06647#bib.bib36)\)additionally show that gradient descent on a linear network preferentially learns the largest\-singular\-value directions of the input–output correlation first, and we conjecture that an analogous direction\-ordering operates under small\-NNsupervised fine\-tuning of the FM head\. Each of these is a candidate contributing factor for our findings; we do not isolate any of them experimentally\.

##### Multi\-axis taxonomies in EEG\-FM benchmarking\.

Existing aggregate benchmarks organize EEG\-FM evaluation either as a single\-dataset case study or as a leaderboard that aggregates across many tasks and datasets\. EEG\-FM\-Bench\(Xionget al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib8)\), AdaBrain\-Bench\(Wuet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib9)\), EEG\-Bench\(Kastratiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib16)\), and Brain4FMs\(Shenet al\.,[2026](https://arxiv.org/html/2606.06647#bib.bib7)\)report dozens of task–dataset cells and broadly converge on the observation that FMs are not uniformly superior across clinical cross\-subject tasks\(Aristimunhaet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib18)\): AdaBrain\-Bench reports that on clinical monitoring tasks “foundation models perform comparable or even worse than traditional models”\(Wuet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib9)\)\. An independent review of ten early EEG\-FMs reaches a parallel conclusion at the methodology level: that evaluation strategies across the field remain heterogeneous and limited, and that standardized, scaled evaluations are a prerequisite for assessing practical off\-the\-shelf utility\(Kuruppuet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib17)\)\. Their taxonomies are organized along axes orthogonal to ours \(task category, pretraining objective, or fine\-tuning strategy\), and none ask which physiological component of the EEG spectrum the learned representation aligns with\. Three FMs and four datasets, selected*a priori*to span complementary labeling outcomes, support representation\-level diagnostics that read each pair against the literature\-fixed properties the cell carries\.

## 3Methods

This section covers the sampling layout, feature extraction and evaluation protocol, the fine\-tuning recipe, and the fiveFMScopediagnostics\.

### 3\.1The2×22\\times 2sampling layout

We assign the four datasets a priori to a2×22\\times 2layout cross\-classifying \(a\) subject relation of the label and \(b\) consensus cross\-subject marker\.*Axis A*is read from dataset structure:within\-subject pairedwhen each subject contributes recordings under both classes \(EEGMAT, SleepDep\) versussubject\-label traitwhen the label is fixed per subject \(ADFTD; Stress under per\-recording DASS\-21 binarization\(Lovibond and Lovibond,[1995](https://arxiv.org/html/2606.06647#bib.bib14)\)\)\. For Stress, 14 of 17 subjects carry a single label across all their recordings; the 3 mixed\-cutoff subjects are kept for the headline benchmark and dropped from any mechanistic diagnostic that requires a subject\-level label\.*Axis B*is a literature\-anchored a priori expectation\. Theconsensuscolumn comprises labels for which prior peer\-reviewed EEG work has established a replicable cross\-subject signature: frontal\-midlineθ\\thetaand parieto\-occipitalα\\alphamodulation for mental arithmetic \(EEGMAT;\(Klimesch,[1999](https://arxiv.org/html/2606.06647#bib.bib12); Gevinset al\.,[1997](https://arxiv.org/html/2606.06647#bib.bib29); Klimesch,[2012](https://arxiv.org/html/2606.06647#bib.bib13)\)\), and posteriorα\\alphapeak\-frequency reduction for AD/FTD \(ADFTD;\(Babiloniet al\.,[2021](https://arxiv.org/html/2606.06647#bib.bib30)\)expert\-panel consensus\)\. Theno\-consensuscolumn comprises labels for which no such signature has been replicated: chronic stress \(Stress; frontal alpha asymmetry contested,\(Reznik and Allen,[2018](https://arxiv.org/html/2606.06647#bib.bib32); van der Vinneet al\.,[2017](https://arxiv.org/html/2606.06647#bib.bib33)\)\) and sleep deprivation \(SleepDep; the dataset paper reports no candidate cross\-subject EEG marker,\(Xianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib28)\)\)\. The2×22\{\\times\}2layout is fixed before any FM is trained, and we state its falsifiable consequence in advance: fine\-tuning should amplify label\-variance preferentially in consensus cells \(tested in Sec\.[4\.4](https://arxiv.org/html/2606.06647#S4.SS4)\)\. Tab\.[1](https://arxiv.org/html/2606.06647#S3.T1)summarizes the assignment\.

Table 1:Dataset assignment to the2×22\\times 2sampling layout\.Rows: subject relation of the label \(Axis A, read from dataset structure\)\. Columns: consensus cross\-subject marker \(Axis B, fixed a priori from peer\-reviewed EEG literature under the criterion in Sec\.[3\.1](https://arxiv.org/html/2606.06647#S3.SS1)\)\. Each cell gives the dataset, its literature anchor, and recordings / subjects\.Consensus markerNo\-consensus markerWithin\-subjectEEGMATSleepDeppairedFMTθ\\theta\+ occipitalα\\alphano replicated marker\(Klimesch,[1999](https://arxiv.org/html/2606.06647#bib.bib12); Gevinset al\.,[1997](https://arxiv.org/html/2606.06647#bib.bib29)\)\(Xianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib28)\)72 rec / 36 subj72 / 36Subject\-labelADFTDStress \(DASS\)traitposteriorα\\alphapeak / powerFAA contested\(Babiloniet al\.,[2021](https://arxiv.org/html/2606.06647#bib.bib30)\)\(Reznik and Allen,[2018](https://arxiv.org/html/2606.06647#bib.bib32); van der Vinneet al\.,[2017](https://arxiv.org/html/2606.06647#bib.bib33)\)65 / 6570 / 17
### 3\.2Per\-dataset specifications

All four datasets are small\-NN\(N≤65N\\leq 65\) resting\-state \(or rest\-plus\-task\) EEG at 19–30 channels,200200Hz after per\-dataset resampling\. We standardize preprocessing across cells: per\-channel mean subtraction,11–4545Hz zero\-phase Butterworth band\-pass,55s non\-overlapping epochs\.

EEGMAT: 36 subjects×\\times72 recordings \(3\-min eyes\-closed rest vs\. 1\-min serial\-subtraction arithmetic\); 19\-channel 10–20 montage\(Zymaet al\.,[2019](https://arxiv.org/html/2606.06647#bib.bib24)\); within\-subject paired, consensus marker\.ADFTD: 65 subjects \(AD = 36, HC = 29\), one recording per subject\(Miltiadous and others,[2023](https://arxiv.org/html/2606.06647#bib.bib6)\); restricted to AD vs\. HC for Axis B alignment with the AD\-specific prior\(Babiloniet al\.,[2021](https://arxiv.org/html/2606.06647#bib.bib30)\); 19\-channel 10–20 montage; subject\-label trait, consensus marker\.SleepDep: the dataset ofXianget al\.\([2024](https://arxiv.org/html/2606.06647#bib.bib28)\)has 71 participants; 38 of them contributed an eyes\-closed recording\. We use that eyes\-closed subset and exclude 2 subjects whose session files are corrupted, leaving3636subjects×\\times7272recordings \(baseline vs\.2424h sleep deprivation\); 19\-channel; within\-subject paired, no\-consensus marker\.Stress\-DASS: 17 subjects×\\times70 recordings with per\-recording DASS\-21 self\-report\(Komarovet al\.,[2020](https://arxiv.org/html/2606.06647#bib.bib5)\); 30\-channel; subject\-label trait, no\-consensus marker\.

### 3\.3Foundation model feature extraction and per\-FM input normalization

We evaluate three open\-weight EEG foundation models spanning complementary pretraining objectives and tokenization strategies: LaBraM\(Jianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib1)\), CBraMod\(Wang and others,[2025](https://arxiv.org/html/2606.06647#bib.bib2)\), and REVE\(El Ouahidiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib3)\)\. Each backbone receives a55s byCC\-channel input at 200 Hz and emits a pooled embedding: a single fixed\-length vector summarizing that 5 s window, of dimensiond∈\{200,200,512\}d\\in\\\{200,200,512\\\}depending on the backbone\. The backbone determines the pooling rule, and we do not re\-pool the data\. LaBraM and CBraMod both produce their embedding by mean pooling over encoder output tokens: LaBraM averages patch tokens after stripping a prepended classification \(CLS\) token \(matching the release\-default head described inJianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib1)§2\.1\), and CBraMod applies 2D adaptive average pooling over the \(channel, patch\) grid \(one of the head variants provided in the CBraMod release\)\. REVE returns the attention\-pooling secondary\-task token \(a learnable query that attends over all channel–patch tokens, perEl Ouahidiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib3)§2\.4\)\.

##### Input normalization\.

All three backbones receive raw microvolt\-scale input and each applies its release\-default internal rescaling before its patch embedding\. We hold the input contract fixed across cells and do not sweep it\.

##### Feature extraction modes\.

Frozen linear probe \(LP\) freezes the backbone \(no gradient\) and passes the pooled embedding through a percentile\-based clip \(fit per dimension on each fold’s training data\), a standard scaler, and anL2L\_\{2\}\-penalized logistic regression with balanced class weights, all held at fixed canonical values\. Per\-window posterior probabilities are mean\-pooled within a recording and thresholded at0\.50\.5to produce the recording\-level decision; the threshold is fixed by design and not tuned\. Full fine\-tuning \(FT\) unfreezes the backbone and adds a single linear classification head trained end\-to\-end under the recipe in Sec\.[3\.5](https://arxiv.org/html/2606.06647#S3.SS5)\. All three backbones expose the same interface\. They take a multi\-channel EEG epoch as input and return a pooled embedding, so the training loop is backbone\-agnostic\.

### 3\.4Evaluation protocol

##### Primary protocol: subject\-disjoint 5\-fold CV\.

All cells report balanced accuracy \(BA\), defined as the unweighted mean of per\-class recall, which we use because the cells are class\-imbalanced\. Cross\-validation is subject\-stratified groupKK\-fold withK=5K=5, grouped by subject so that no subject contributes recordings to both the train and test folds of a given split\(Brookshire and others,[2024](https://arxiv.org/html/2606.06647#bib.bib15)\)\. We aggregate at the recording level by averaging per\-window posteriors within a recording, thresholding at0\.50\.5, and comparing to the recording label\. This is the single evaluation rule used for the sampling layout axis B classification and the baseline performance table \(Tab\.[2](https://arxiv.org/html/2606.06647#S4.T2)\)\.

##### Multi\-seed requirement\.

Small\-NNcells exhibit non\-trivial seed\-to\-seed FT variability under deterministic cuDNN\. We average every balanced accuracy claim over three FT training seeds \(seeds\{42,123,2024\}\\\{42,123,2024\\\}\) and report it as mean±\\pmsample standard deviation; LP additionally averages over a fixed eight\-seed set\. Wide confidence intervals reflect the inherent instability of fine\-tuning on small\-NNcohorts; the relative performance gaps between tiers remain consistent across seeds\.

### 3\.5Fine\-tuning configuration

Each foundation model is fine\-tuned under the configuration published in its original repository, applied uniformly across the four cells; we do not run a per\-dataset hyperparameter sweep\. All three FMs share the AdamW optimizer, the cross\-entropy loss with label smoothing, early stopping on a moving\-average of held\-out balanced accuracy, and a single linear classification head with two output units\. LaBraM additionally uses the small\-scale head initialization prescribed in its release\(Jianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib1)\)\.

### 3\.6Reference baselines \(classical and non\-FM\-deep\)

Two reference baselines situate the FM tier in Tab\.[2](https://arxiv.org/html/2606.06647#S4.T2)\.

##### Classical handcrafted\-feature baseline\.

A per\-recording feature vector is built from the time\-averaged power spectrum \(Welch,nperseg=min⁡\(256,T\)n\_\{\\text\{perseg\}\}=\\min\(256,T\)\): for each channel, mean band\-power inθ\\theta\(44–88Hz\),α\\alpha\(88–1313Hz\) andβ\\beta\(1313–3030Hz\); per\-channelθ/α\\theta/\\alphaandθ/β\\theta/\\betaratios; and frontal/parietal alpha\-asymmetrylog⁡Pαright−log⁡Pαleft\\log P\_\{\\alpha\}^\{\\text\{right\}\}\-\\log P\_\{\\alpha\}^\{\\text\{left\}\}over six electrode pairs \(Fp1/Fp2, F3/F4, F7/F8, C3/C4, P3/P4, O1/O2\)\. At1919channels this yields5×19\+6=1015\\times 19\+6=101features\. We fitL2L\_\{2\}\-regularized logistic regression with feature standardization, inverse regularization strengthC=1C=1, and balanced class weights, under the same subject\-disjoint 5\-fold CV, three seeds\.

##### Non\-FM\-deep baselines\.

Four convolutional / hybrid architectures trained from scratch on raw EEG: EEGNet\(Lawhernet al\.,[2018](https://arxiv.org/html/2606.06647#bib.bib21)\), ShallowConvNet and DeepConvNet\(Schirrmeisteret al\.,[2017](https://arxiv.org/html/2606.06647#bib.bib22)\), and EEG\-Conformer\(Songet al\.,[2023](https://arxiv.org/html/2606.06647#bib.bib23)\)\. Each is trained per dataset under the same CV splits and seeds, withzz\-score input normalization \(matching the early\-BatchNorm convention used in standard implementations of these architectures\)\. No foundation\-model pretrained weights are used\.

### 3\.7Diagnostic method specifications

![Refer to caption](https://arxiv.org/html/2606.06647v1/x2.png)Figure 2:FMScopediagnostic pipeline\.The framework evaluates frozen representations from EEG foundation models across three sequential phases\.Phase Iestablishes the existence of the Identity Trap by quantifying subject\-variance dominance and testing its linear removability via least\-squares concept erasure \(LEACE\)\.Phase IIcharacterizes the underlying mechanisms using three tools: localizing the depth of subject\-identity emergence \(layer\-wise probing\), isolating the physiological carrier via spectral input ablation \(FOOOF\), and evaluating task\-axis alignment across subjects \(direction consistency\)\.Phase IIIsynthesizes these diagnostic outputs into an integrated report, converging on a final cell\-level verdict for the specific cohort\-model pair\.FMScopecomprises five frozen\-representation diagnostics, organized by the question each answers\. Two of these establish the Identity Trap itself: whether subject identity dominates the representation \(variance decomposition, Sec\.[3\.7\.1](https://arxiv.org/html/2606.06647#S3.SS7.SSS1)\) and whether that dominance is confined to a linearly removable axis \(subject\-axis erasure, Sec\.[3\.7\.2](https://arxiv.org/html/2606.06647#S3.SS7.SSS2)\)\. The other three characterize its origin and structure: its spectral carrier \(aperiodic ablation, Sec\.[3\.7\.3](https://arxiv.org/html/2606.06647#S3.SS7.SSS3)\), the depth at which it becomes linearly separable \(layer\-wise probe, Sec\.[3\.7\.4](https://arxiv.org/html/2606.06647#S3.SS7.SSS4)\), and whether subjects encode the task contrast along a shared direction \(direction consistency, Sec\.[3\.7\.5](https://arxiv.org/html/2606.06647#S3.SS7.SSS5)\)\. Each diagnostic carries a scope condition stating the cells in which it returns a defined answer and is read only where that scope is met, so a cohort need not exercise all five\.

#### 3\.7\.1Variance decomposition of frozen representations

For each \(dataset, FM\) we decompose per\-window embedding variance via a crossed two\-factor sum\-of\-squares partition with label and subject as factors; this is the marginalization step of demixed PCA\(Kobaket al\.,[2016](https://arxiv.org/html/2606.06647#bib.bib40)\)\.

SStotal≈SSlabel\+SSsubject\+SSresidual,\\mathrm\{SS\}\_\{\\text\{total\}\}\\;\\approx\\;\\mathrm\{SS\}\_\{\\text\{label\}\}\+\\mathrm\{SS\}\_\{\\text\{subject\}\}\+\\mathrm\{SS\}\_\{\\text\{residual\}\},\(1\)where, for a factorg∈\{label,subject\}g\\in\\\{\\text\{label\},\\text\{subject\}\\\}with groups indexed bycc, we compute the between\-group SS marginally over the other factor as

SSg=∑cnc‖𝐟¯c−𝐟¯‖2,\\mathrm\{SS\}\_\{g\}\\;=\\;\\sum\_\{c\}n\_\{c\}\\,\\bigl\\lVert\\bar\{\\mathbf\{f\}\}\_\{c\}\-\\bar\{\\mathbf\{f\}\}\\bigr\\rVert^\{2\},\(2\)with𝐟¯c\\bar\{\\mathbf\{f\}\}\_\{c\}the per\-group mean embedding,𝐟¯\\bar\{\\mathbf\{f\}\}the grand mean,ncn\_\{c\}the group window count, andSSresidual\\mathrm\{SS\}\_\{\\text\{residual\}\}obtained by subtraction and clipped at zero\. We operate at the window level so the partition is defined for single\-session trait cells \(ADFTD: each recording yields≥2\\geq 2windows\)\. Cluster\-bootstrap CIs over subjects \(B=2,000B=2\{,\}000\) preserve the within\-subject correlation structure\(Field and Welsh,[2007](https://arxiv.org/html/2606.06647#bib.bib25)\)\. Reading the partition is cell\-conditional\. In within\-subject paired cells the two factors are orthogonal so the fractions sum to<1<\\\!1; in trait cells the subject factor structurally contains the label factor, which naturally allows the marginal fractions to sum to\>1\>\\\!1because they represent overlapping variance components\. We therefore emphasize the within\-cell label\-fraction shift \(FT−\-frozen\) rather than absolute fractions\. Letflabel=SSlabel/SStotalf\_\{\\mathrm\{label\}\}=\\mathrm\{SS\}\_\{\\text\{label\}\}/\\mathrm\{SS\}\_\{\\text\{total\}\}andfsubj=SSsubject/SStotalf\_\{\\mathrm\{subj\}\}=\\mathrm\{SS\}\_\{\\text\{subject\}\}/\\mathrm\{SS\}\_\{\\text\{total\}\}denote the label\- and subject\-variance fractions; throughout,Δ\\Deltadenotes the change after the relevant intervention \(here fine\-tuning; in Sec\.[3\.7\.3](https://arxiv.org/html/2606.06647#S3.SS7.SSS3), aperiodic removal\), soΔflabel\\Delta f\_\{\\mathrm\{label\}\}is the FT−\-frozen change in label\-variance fraction andΔfsubj\\Delta f\_\{\\mathrm\{subj\}\}is the analogous change in subject\-variance fraction\. Scope condition: all cells; for Stress the 3/17 mixed\-cutoff subjects are dropped, leaving a 14\-subject strict\-trait subset\.

##### Column\-effect test onΔflabel\\Delta f\_\{\\mathrm\{label\}\}\.

Across the 12 \(cell, FM\) pairs we contrast the 6 consensus\-marker pairs against the 6 no\-consensus pairs onΔflabel\\Delta f\_\{\\mathrm\{label\}\}using a one\-sided Mann–Whitney U test; the test is non\-parametric and distribution\-free, appropriate for the small sample\. The 12 pairs share data within cell \(cell\-leveln=4n=4with 3 FMs nested per cell\), so we read U as an effect\-size summary \(we report the rank\-biserialrrin the Results\) rather than as axis\-level inference\.

#### 3\.7\.2Subject\-axis erasure

Variance decomposition measures how much of the representation subject identity occupies; erasure tests whether that identity is confined to a removable linear axis and at what cost to the clinical label\. We apply LEACE\(Belroseet al\.,[2023](https://arxiv.org/html/2606.06647#bib.bib48)\), least\-squares concept erasure: the minimum\-displacement affine map that renders a target concept linearly unpredictable\. For per\-window embeddings𝐗\\mathbf\{X\}and one\-hot subject labels𝐙\\mathbf\{Z\}, LEACE removes the subspace spanned by𝐖Σ𝐗𝐙\\mathbf\{W\}\\Sigma\_\{\\mathbf\{XZ\}\}, with𝐖=Σ𝐗𝐗−1/2\\mathbf\{W\}=\\Sigma\_\{\\mathbf\{XX\}\}^\{\-1/2\}the whitening transform, which we estimate under Ledoit–Wolf shrinkage for numerical conditioning\. Centred one\-hot labels forkksubjects spank−1k\-1dimensions, so the erased subspace has rankk−1k\-1: it is the between\-subject\-mean subspace\. After erasure no linear classifier recovers subject identity above chance; the guarantee is linear only, so we report the identity still recoverable by a nonlinear probe \(a one\-hidden\-layer MLP\) alongside it\. The map is undefined whenk−1≥dk\-1\\geq d, where the subject subspace fills the feature space; we test this condition and skip the cell when it holds\. To quantify the cost of erasure to the task, we re\-run the recording\-level label probe \(Sec\.[3\.7\.3](https://arxiv.org/html/2606.06647#S3.SS7.SSS3)\) on the erased features and reportΔerase\\Delta\_\{\\mathrm\{erase\}\}, the change in label balanced accuracy\. Scope condition: all cells for the erasure and nonlinear\-residual read\-out\.Δerase\\Delta\_\{\\mathrm\{erase\}\}is interpretable only in within\-subject paired cells, where the label varies within subject and is therefore separable from the erased between\-subject subspace; in trait cells the label is a per\-subject constant whose class\-mean direction lies inside the subject subspace, so erasure removes it by construction andΔerase\\Delta\_\{\\mathrm\{erase\}\}is undefined\. It is read only where the un\-erased baseline also clears chance\.

#### 3\.7\.3Spectral Anchor Ablation \(FOOOF\)

We intervene on the EEG via FOOOF ablation, which modifies the spectral shape \(aperiodic background vs\. periodic peaks\) while preserving frequency support, and observe how the FM’s frozen probe responds\.

For each recording we fit a per\-channel FOOOF decomposition\(Donoghueet al\.,[2020](https://arxiv.org/html/2606.06647#bib.bib26)\)on the11–4545Hz band, parameterizing the log\-power spectrum as

log10⁡PSD\(f\)=b−χlog10⁡f⏟aperiodicL\(f\)\+∑n=1NGn\(f\),\\log\_\{10\}\\\!\\mathrm\{PSD\}\(f\)=\\underbrace\{b\-\\chi\\,\\log\_\{10\}f\}\_\{\\text\{aperiodic \}L\(f\)\}\+\\sum\_\{n=1\}^\{N\}G\_\{n\}\(f\),\(3\)wherebbis an offset,χ\\chi\(Greek chi\) the aperiodic exponent \(slope of1/f1/fon log\-log axes; higherχ\\chimeans relatively more low\-frequency power\), and eachGn\(f\)G\_\{n\}\(f\)a Gaussian peak in log\-power\. WithX\(f\)X\(f\)the per\-channel FFT,A^\(f\)=10L\(f\)/2\\hat\{A\}\(f\)=10^\{L\(f\)/2\}the fitted aperiodic amplitude envelope, andPn\(f\)=A^\(f\)2\(10Gn\(f\)−1\)P\_\{n\}\(f\)=\\hat\{A\}\(f\)^\{2\}\(10^\{G\_\{n\}\(f\)\}\-1\)the linear\-power contribution of peaknn, we construct three phase\-preserving reconstructions

\|X~\(f\)\|=\{\|X\(f\)\|/A^\(f\)aperiodic\-removed,max⁡\(\|X\(f\)\|2−∑nPn\(f\),0\)periodic\-removed,\|X~per\-rem\(f\)\|/A^\(f\)both\-removed,\|\\widetilde\{X\}\(f\)\|=\\begin\{cases\}\|X\(f\)\|/\\hat\{A\}\(f\)&\\text\{aperiodic\-removed,\}\\\\\[1\.0pt\] \\sqrt\{\\max\(\|X\(f\)\|^\{2\}\-\\sum\_\{n\}P\_\{n\}\(f\),\\,0\)\}&\\text\{periodic\-removed,\}\\\\\[1\.0pt\] \|\\widetilde\{X\}\_\{\\text\{per\-rem\}\}\(f\)\|/\\hat\{A\}\(f\)&\\text\{both\-removed,\}\\end\{cases\}\(4\)each combined with the original phase spectrum and inverted via FFT\. Each reconstruction is re\-extracted through the FM and probed with two diagnostics\. The label probe is a binaryL2L\_\{2\}\-penalized logistic regression on per\-window features under subject\-stratified groupKK\-fold \(K=5K=5\)\. The pipeline matches the LP recipe \(percentile clip \+ standard scaler\) and aggregates to the recording level by mean\-pooling per\-window posteriors; balanced accuracy is averaged over88seeds \(the LP seed set\)\. The subject\-identity probe is a multi\-class classifier of subject identity \(one class per subject\) under a 5\-fold*temporal\-block*protocol: per subject, the windows are concatenated across recordings in canonical order and split into 5 contiguous blocks; foldfftests blockfffrom every subject and trains on the remaining four blocks\. The classifier is linear discriminant analysis with Ledoit–Wolf shrinkage of the within\-class covariance toward a scaled identity matrix\. The shrinkage stabilizes the covariance estimate when the number of windows per subject is small\. The classifier uses a closed\-form solver with no stochastic optimization, so we run it at a single seed\. Per\-fold balanced accuracy is averaged across the 5 folds\. On multi\-recording cells \(EEGMAT, SleepDep, Stress\) the blocks cross recording boundaries, so the probe also tests whether the subject signal is stable across sessions\. ADFTD has only one recording per subject by design, so its blocks are always within a single recording\.

##### FT extension\.

To extend the intervention from the frozen representation to the fine\-tuned representation, we re\-extract and fine\-tune each backbone end\-to\-end on the ablated input under the same recipe as Sec\.[3\.5](https://arxiv.org/html/2606.06647#S3.SS5)and run both probes \(state and subject\) on the resulting test\-fold per\-window features concatenated across the 5 CV folds\. Per \(cell, FM, condition\) we averageΔ\\Deltaprobe BA = ablated minus original input over 3 fine\-tuning seeds; the FT extension covers both conditions \(aperiodic\-removed and periodic\-removed\)\.

Output\. The diagnostic returns a per\-cell1/f1/f\-role reading derived from the joint change in label\-probe BA and subject\-probe BA under aperiodic removal, where eachΔ\\Deltais computed as aperiodic\-removed minus original\.

Scope condition: all four cells for the label probe and subject probe at frozen and FT representations\.

#### 3\.7\.4Layer\-wise probe

To localize the depth at which subject identity emerges within the encoder, we replay each FM’s forward pass with eight intermediate\-depth captures and apply the same two read\-outs used elsewhere in the paper \(temporal\-block subject\-ID probe; canonical recording\-level linear probe with three CV seeds\) at every captured depth\. Pooling at each captured depth uses the backbone’s own canonical scheme, so the final\-depth row reproduces the main\-paper frozen feature to numerical precision\.

Scope condition: all four cells for both probes\.

#### 3\.7\.5Within\-subject direction and signal\-to\-noise characterization

For within\-subject paired cells we measure \(i\) whether subjects encode the state contrast along a consistent direction in FM feature space, and \(ii\) whether per\-subject contrast magnitudes separate from cross\-subject heterogeneity\. For each subjectsswe form the per\-subject contrast vector

𝚫s=𝐟¯s,1−𝐟¯s,0,\\boldsymbol\{\\Delta\}\_\{s\}\\;=\\;\\bar\{\\mathbf\{f\}\}\_\{s,1\}\-\\bar\{\\mathbf\{f\}\}\_\{s,0\},\(5\)where𝐟¯s,y\\bar\{\\mathbf\{f\}\}\_\{s,y\}is the window\-mean embedding for subjectssunder label classyy\. The within\-subject direction consistency indexc¯\\bar\{c\}\(WSCI\) is the mean pairwise cosine similarity across thennper\-subject contrast vectors,

c¯=\(n2\)−1∑i<j𝚫i⊤𝚫j∥𝚫i∥∥𝚫j∥;\\bar\{c\}\\;=\\;\\binom\{n\}\{2\}^\{\-1\}\\sum\_\{i<j\}\\frac\{\\boldsymbol\{\\Delta\}\_\{i\}^\{\\top\}\\boldsymbol\{\\Delta\}\_\{j\}\}\{\\lVert\\boldsymbol\{\\Delta\}\_\{i\}\\rVert\\,\\lVert\\boldsymbol\{\\Delta\}\_\{j\}\\rVert\};\(6\)highc¯\\bar\{c\}implies a shared cross\-subject axis,c¯≈0\\bar\{c\}\\approx 0implies idiosyncratic directions\. The per\-subject SNR is a signal\-to\-noise ratio of mean contrast magnitudeσs=∥𝚫s∥\\sigma\_\{s\}=\\lVert\\boldsymbol\{\\Delta\}\_\{s\}\\rVertto its cross\-subject standard deviation,

SNRper\-subj=σs¯SDs\(σs\)\.\\mathrm\{SNR\}\_\{\\text\{per\-subj\}\}\\;=\\;\\frac\{\\overline\{\\sigma\_\{s\}\}\}\{\\mathrm\{SD\}\_\{s\}\\\!\\left\(\\sigma\_\{s\}\\right\)\}\.\(7\)c¯\\bar\{c\}and SNR together characterize direction agreement and magnitude\-vs\-noise; we do not compose them into a single assessment\. Scope condition: within\-subject paired cells only \(EEGMAT, SleepDep\); marked inapplicable in trait cells\.

##### Within\-cell label\-structure detector \(cosine PERMANOVA\)\.

c¯\\bar\{c\}is undefined in trait cells \(no within\-subject contrast\) and uninterpretable when subjects do not share a direction \(c¯≈0\\bar\{c\}\\approx 0\)\. To detect whether a label\-associated signature exists in feature geometry, independent of whether subjects share a contrast direction, we run PERMANOVA\(Anderson,[2001](https://arxiv.org/html/2606.06647#bib.bib34)\)on the cosine dissimilarity matrix of per\-window embeddings\. The design factor is the labeling unit \(subject for ADFTD; recording within pair block for EEGMAT and SleepDep; recording for Stress\)\. We use999999permutations per \(cell, FM, state\), giving a floor ofp≈0\.001p\\approx 0\.001\. We read PERMANOVA as a within\-cell test for whether label\-related geometric structure exists at all, not as the axis\-level consensus\-vs\-no\-consensus contrast \(which is reported onΔflabel\\Delta f\_\{\\mathrm\{label\}\}in Sec\.[3\.7\.1](https://arxiv.org/html/2606.06647#S3.SS7.SSS1)\)\. Scope condition: all four cells\. Per\-cell assessments are tabulated in App\.[C](https://arxiv.org/html/2606.06647#A3)\.

## 4Results

The Results proceed in four steps\. We start with the baseline performance picture: full fine\-tuning rarely improves over a frozen linear probe \(Sec\.[4\.1](https://arxiv.org/html/2606.06647#S4.SS1)\)\. To see what fine\-tuning starts from, we then characterize the frozen representation itself; subject identity dominates, and the encoder amplifies it into a linearly separable direction within its first few transformer blocks \(Sec\.[4\.2](https://arxiv.org/html/2606.06647#S4.SS2)\)\. A natural next question is which input feature carries that subject identity\. The aperiodic1/f1/fbackground is one such carrier on LaBraM and CBraMod \(Sec\.[4\.3](https://arxiv.org/html/2606.06647#S4.SS3)\)\. Sec\.[4\.4](https://arxiv.org/html/2606.06647#S4.SS4)summarizes the per\-cell evidence for the final assessment\.

### 4\.1Baseline performance and the fine\-tuning paradox

Tab\.[2](https://arxiv.org/html/2606.06647#S4.T2)reports subject\-disjoint balanced accuracy across the four cells under three FMs \(linear probe and full fine\-tuning\), four convolutional / hybrid baselines trained from scratch, and a classical handcrafted\-feature LogReg\.

Table 2:Subject\-disjoint 5\-fold CV balanced accuracy across the four cells\.Rows: classical handcrafted\-feature LogReg, four non\-FM deep baselines, and three foundation models under linear probe \(LP\) and full fine\-tuning \(FT\)\. Columns ordered by the four\-cell layout\. Values are 3\-seed mean with sample std \(seeds\{42,123,2024\}\\\{42,123,2024\\\}\);bold= global best within column\. Method details: Sec\.[3\.5](https://arxiv.org/html/2606.06647#S3.SS5)\.Fine\-tuning rarely beats the frozen linear probe: 10 of 12 pairs fall within±1\\pm 1pp of the corresponding LP\. ADFTD is the one cell where FM pretraining yields a measurable advantage over the classical and non\-FM\-deep baselines\. On EEGMAT the classical RF beats every FM tier\. On Stress and SleepDep all four tiers fall within a0\.430\.43–0\.570\.57band\. The Wang*et al\.*0\.90470\.9047Stress headline\(Wanget al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib4)\)does not reproduce under subject\-disjoint cross\-validation; the gap is∼45\\sim 45pp\.

### 4\.2Subject\-dominant baseline geometry

![Refer to caption](https://arxiv.org/html/2606.06647v1/x3.png)Figure 3:Variance decomposition across the four cells\.Window\-level subject and label fractions for frozen and fine\-tuned features\. Stacked bars: subject \(lower\) \+ label \(upper\); gap to 100% is residual\. Dashed red line marks the matched random\-Gaussian nullfsubjf\_\{\\mathrm\{subj\}\}\(mean over 20 seeds; per\-cell numeric callout in each panel\)\. Frozen subject fraction exceeds the null by 13–89×\\timesacross 12 \(cell, FM\) pairs; under fine\-tuning, subject fraction rises in 12 of 12 pairs \(largest on Stress:\+32\+32/\+63\+63/\+43\+43pp on LaBraM / CBraMod / REVE\), while label fraction rises only in cells with a consensus cross\-subject marker \(per\-pair values in Tab\.[A1](https://arxiv.org/html/2606.06647#A1.T1); null\-control values in Tab\.[A2](https://arxiv.org/html/2606.06647#A2.T2)\)\.We decompose per\-window frozen and fine\-tuned features into label, subject, and residual variance components \(Sec\.[3\.7\.1](https://arxiv.org/html/2606.06647#S3.SS7.SSS1); Fig\.[3](https://arxiv.org/html/2606.06647#S4.F3)\)\. The frozen subject\-variance fraction is 13–89×\\timesa matched random\-Gaussian null in 12 of 12 \(cell×\\timesFM\) pairs \(per\-pair values in Tab\.[A2](https://arxiv.org/html/2606.06647#A2.T2)\); fine\-tuning increases the subject fraction in all 12 pairs \(\+10\+10to\+63\+63pp, largest on Stress\), whileΔflabel\\Delta f\_\{\\mathrm\{label\}\}is positive in 6 of 6 consensus\-marker pairs \(\+0\.6\+0\.6to\+8\.4\+8\.4pp\) and spans zero in 6 of 6 no\-consensus pairs \(−1\.9\-1\.9to\+0\.8\+0\.8pp\)\. The twoΔflabel\\Delta f\_\{\\mathrm\{label\}\}ranges do not overlap \(one\-sided Mann–Whitney U on the 12 values:p=0\.0022p=0\.0022, rank\-biserialr=\+0\.94r=\+0\.94; Sec\.[3\.7\.1](https://arxiv.org/html/2606.06647#S3.SS7.SSS1)\)\.

This subject dominance is a removable linear axis, not diffuse structure \(Tab\.[3](https://arxiv.org/html/2606.06647#S4.T3)\)\. Closed\-form erasure \(Sec\.[3\.7\.2](https://arxiv.org/html/2606.06647#S3.SS7.SSS2)\) drives the linear subject probe to chance in all 12 pairs\. A nonlinear probe still recovers part of the identity, substantial for REVE, which we report alongside\. Whether removing the axis helps the task can be asked only in within\-subject paired cells, where the label varies within subject\. On EEGMAT, the one such cell with an above\-chance baseline, the frozen probes start below the classical baseline \(0\.8470\.847\); erasure lifts all three FMs and closes this gap, with LaBraM reaching0\.8750\.875\. On SleepDep \(no\-consensus, near\-chance baseline\) the change is small and inconsistent across FMs, as expected when little label signal is present\. This is a single clean demonstration; App\.[E](https://arxiv.org/html/2606.06647#A5)extends it to five further cohorts with a within\-subject contrast, whereΔerase\\Delta\_\{\\mathrm\{erase\}\}stays positive on all three consensus\-marker cohorts and turns mixed\-to\-negative on the two without an established marker \(positive in 12 of 12 consensus cell×\\timesFM values including the primary cell; one\-sided sign testp=2\.4×10−4p=2\.4\\times 10^\{\-4\}\)\.

Table 3:Subject\-axis erasure on frozen features\.BA×100\\times 100; chance=100/k=100/k; erased subspace rank=k−1=k\-1\. Subject BA is a linear subject classifier before and after erasure; nonlinear residual is an MLP after erasure \(LEACE guarantees linear erasure only\)\.Δerase\\Delta\_\{\\mathrm\{erase\}\}is the change in recording\-level label BA \(3 seeds, mean±\\pmSD\), defined only in within\-subject paired cells; trait cells \(ADFTD, Stress\) are marked “—” because the label is a fixed subject attribute lying inside the subject subspace\.![Refer to caption](https://arxiv.org/html/2606.06647v1/x4.png)Figure 4:Layer\-wise subject and label probes\.Rows show the subject relation of the label \(within\-subject paired on top, trait on the bottom\); columns show the consensus axis \(consensus on the left, no\-consensus on the right\)\. Each panel shows two probes: the temporal\-block subject\-ID probe \(red\) and the canonical recording\-level label probe \(blue\)\. Lines are the mean across the three FMs \(LaBraM, CBraMod, REVE\); shaded bands span their min–max range \(n=3n=3FMs, an envelope rather than a confidence interval\)\. The horizontal axis is relative transformer depth \(0is post\-embedding pre\-block;11is the final pooled feature\)\. Across all four cells the subject probe rises toward0\.550\.55–0\.990\.99within two to three transformer blocks; the wide early\-depth band reflects FMs that reach this ceiling at different depths\. The label probe is cell\-conditional\. Dotted lines mark label chance \(0\.50\.5\) and the per\-cell subject\-ID chance\.A layer\-wise re\-probe \(Sec\.[3\.7\.4](https://arxiv.org/html/2606.06647#S3.SS7.SSS4); Fig\.[4](https://arxiv.org/html/2606.06647#S4.F4)\) shows that the subject axis becomes linearly decodable inside the encoder rather than at the embedding output: the encoder actively amplifies subject identity into a separable direction within its first two to three blocks\. The label probe is cell\-conditional; the per\-cell trajectories feed the per\-cell assessment in Sec\.[4\.4](https://arxiv.org/html/2606.06647#S4.SS4)\.

### 4\.3Aperiodic1/f1/fas a subject\-identity carrier

![Refer to caption](https://arxiv.org/html/2606.06647v1/x5.png)\(a\)
![Refer to caption](https://arxiv.org/html/2606.06647v1/x6.png)\(b\)

Figure 5:Aperiodic and periodic ablation of the input, frozen and intervention\-FT\.\([5\(a\)](https://arxiv.org/html/2606.06647#S4.F5.sf1)\)Representative log\-log power spectrum per cell with FOOOF decomposition: black solid, measured PSD; orange dashed,1/f1/faperiodic fit; blue shading, periodic peaks \(PSD minus aperiodic\)\. The aperiodic fit defines the input ablation \(Sec\.[3\.7\.3](https://arxiv.org/html/2606.06647#S3.SS7.SSS3)\)\.\([5\(b\)](https://arxiv.org/html/2606.06647#S4.F5.sf2)\)Δ\\Deltaprobe BA under FOOOF ablation; circles mark−\-aperiodic, squares mark−\-periodic, a line within each FM connects the two conditions per cell, colour codes the cell; top row frozen, bottom row FT \(intervention, 3\-seed mean\)\. Frozen: FOOOF−\-aperiodic on LaBraM and CBraMod uniformly drops the subject probe by99to1919pp across all four cells;−\-periodic shifts probe BA by≤0\.8\\leq 0\.8pp on every \(FM, cell\) combination; REVE’s already\-saturated subject probe \(≥0\.93\\geq 0\.93baseline\) shows no measurable aperiodic dependence\. FT \(intervention\): re\-extracting and fine\-tuning under aperiodic\-removed input attenuates the FT subject probe on 7 of 8 \(cell, FM\) pairs for LaBraM/CBraMod \(−4\-4to−20\-20pp\); LaBraM×\\timesADFTD is borderline \(−2\.0±4\.7\-2\.0\\pm 4\.7pp\)\.−\-Periodic FT shifts the subject probe by\|Δ\|≤2\.5\|\\Delta\|\\leq 2\.5pp on every \(cell, LaBraM/CBraMod\) pair, mirroring the frozen null and locating the FT effect in the1/f1/fcomponent rather than narrowband peaks\. REVE’s FT bottom\-row shifts\|Δ\|≤2\.5\|\\Delta\|\\leq 2\.5pp on every cell under both ablations, preserving the negative control downstream\. Cell\-conditional label\-probe response is interpreted in the main text\.We intervene on the input EEG via FOOOF spectral\-shape ablation \(aperiodic background versus periodic peaks; frequency support preserved\) and observe the FM probe response \(Sec\.[3\.7\.3](https://arxiv.org/html/2606.06647#S3.SS7.SSS3); Fig\.[5](https://arxiv.org/html/2606.06647#S4.F5)\)\.

Removing the aperiodic1/f1/fcomponent drops the subject probe uniformly on LaBraM and CBraMod\.On LaBraM and CBraMod, FOOOF−\-aperiodic drops the linear subject probe BA by99to1919pp across all four cells \(per\-\(cell, FM\) values in Tab\.[A4](https://arxiv.org/html/2606.06647#A4.T4); LaBraM:−18\.4\-18\.4/−10\.9\-10\.9/−17\.9\-17\.9/−9\.1\-9\.1pp on EEGMAT / ADFTD / SleepDep / Stress; CBraMod:−18\.9\-18\.9/−10\.2\-10\.2/−17\.1\-17\.1/−9\.3\-9\.3pp\)\. The drop is large, uniform across cells of very different paradigms, and goes in the same direction on both FMs\.

REVE’s baseline subject probe is already at≥0\.93\\geq 0\.93BA and shifts by≤1\.2\\leq 1\.2pp under−\-aperiodic; we therefore limit the aperiodic\-anchor claim to LaBraM and CBraMod \(Sec\.[5\.5](https://arxiv.org/html/2606.06647#S5.SS5)\)\. FOOOF−\-periodic shows no effect across all 12 \(cell, FM\) pairs \(≤0\.8\\leq 0\.8pp on both probes\)\. Because both ablations pass through the same FOOOF decompose\-and\-invert reconstruction \(Eq\.[4](https://arxiv.org/html/2606.06647#S3.E4)\) and differ only in which component is removed, this−\-periodic null rules out the reconstruction round\-trip itself as the cause and attributes the−\-aperiodic drop to the removed aperiodic content\. The label\-probe response under−\-aperiodic is cell\-conditional and feeds the per\-cell1/f1/frole column of Tab\.[4](https://arxiv.org/html/2606.06647#S5.T4)\.

The same intervention extends to the fine\-tuned representation\.To test whether FT inherits the frozen aperiodic dependence, we re\-extracted and fine\-tuned each FM end\-to\-end on both ablated inputs under the same recipe as the baseline run \(3 seeds; Sec\.[3\.7\.3](https://arxiv.org/html/2606.06647#S3.SS7.SSS3)\) and recomputed both probes on the resulting features \(Fig\.[5](https://arxiv.org/html/2606.06647#S4.F5)b, bottom row\)\. On LaBraM and CBraMod, the FT subject probe drops under−\-aperiodic on 7 of 8 \(cell, FM\) pairs \(meanΔ\\Deltaranging from−4\-4to−20\-20pp, with the per\-seed range staying below zero\)\. LaBraM×\\timesADFTD is the borderline case \(−2\.0±4\.7\-2\.0\\pm 4\.7pp\)\. This is consistent with two properties of that cell: its frozen subject probe is already near ceiling, and its high recording\-level redundancy gives the encoder multiple non\-aperiodic carriers to amplify\.−\-Periodic at the FT level shifts the subject probe by\|Δ\|≤2\.5\|\\Delta\|\\leq 2\.5pp on all 8 \(cell, LaBraM/CBraMod\) pairs, matching the frozen null and confirming that the FT carrier is the1/f1/fcomponent, not narrowband peaks\. REVE’s FT subject probe shifts by\|Δ\|≤2\.5\|\\Delta\|\\leq 2\.5pp on every cell under both ablations, preserving the negative control at the FT representation\.

### 4\.4Cell\-conditional label\-side outcomes

Two cell\-level diagnostics remain\. The layer\-wise label\-probe trajectory shows how much label\-aligned variance survives to the final pooled feature, and the within\-subject direction\-consistency test asks whether subjects share a contrast axis at all\. The combined assessment matrix appears with the Discussion \(Tab\.[4](https://arxiv.org/html/2606.06647#S5.T4)\)\.

Layer\-wise label\-probe trajectory\(Sec\.[3\.7\.4](https://arxiv.org/html/2606.06647#S3.SS7.SSS4); Fig\.[4](https://arxiv.org/html/2606.06647#S4.F4)\)\. On EEGMAT, the label probe is stable across depth \(0\.690\.69–0\.780\.78\) for all three FMs\. On ADFTD, the label probe peaks early \(relative depth∼0\.17\\sim 0\.17, BA0\.780\.78–0\.830\.83\) and descends as later blocks compress along the subject axis without further label gain\. On SleepDep, the label probe is flat near chance \(0\.500\.50–0\.590\.59\) at every depth\. On Stress, the label probe descends monotonically with depth on all three FMs \(LaBraM0\.49→0\.420\.49\\to 0\.42; CBraMod0\.49→0\.360\.49\\to 0\.36; REVE0\.47→0\.410\.47\\to 0\.41\): the modest label\-aligned signal in the pre\-block embedding does not survive to the final pooled feature\.

![Refer to caption](https://arxiv.org/html/2606.06647v1/x7.png)\(a\)
![Refer to caption](https://arxiv.org/html/2606.06647v1/x8.png)\(b\)
![Refer to caption](https://arxiv.org/html/2606.06647v1/x9.png)\(c\)

Figure 6:Within\-subject direction and SNR \(frozen vs\. FT\)\.For each subject the contrast vectorvi=μi,1−μi,0v\_\{i\}=\\mu\_\{i,1\}\-\\mu\_\{i,0\}is formed in the FM’s full feature space; the group consensus isvc=vi¯/∥vi¯∥v\_\{c\}=\\overline\{v\_\{i\}\}/\\lVert\\overline\{v\_\{i\}\}\\rVert\. Filled gray = EEGMAT, outlined black = SleepDep\.\([6\(a\)](https://arxiv.org/html/2606.06647#S4.F6.sf1)\)Polar half\-circle rose \(frozen\), one panel per FM:θi=arccos⁡⟨vi,vc⟩\\theta\_\{i\}=\\arccos\\langle v\_\{i\},v\_\{c\}\\rangle;0∘0^\{\\circ\}aligns,90∘90^\{\\circ\}is the high\-dimensional isotropic null\.\([6\(b\)](https://arxiv.org/html/2606.06647#S4.F6.sf2)\)Group\-levelc¯\\bar\{c\}\(Eq\.[6](https://arxiv.org/html/2606.06647#S3.E6)\) across 3 FMs×\\times2 within\-subject paired cells, frozen vs\. FT\. Trait cells \(Stress, ADFTD\) lack within\-subject contrast by design\.\([6\(c\)](https://arxiv.org/html/2606.06647#S4.F6.sf3)\)Per\-subject SNR \(Eq\.[7](https://arxiv.org/html/2606.06647#S3.E7)\); dashed line atSNR=1\\mathrm\{SNR\}=1\(signal∼\\simnoise\)\.Within\-subject direction consistency\(Sec\.[3\.7\.5](https://arxiv.org/html/2606.06647#S3.SS7.SSS5); Fig\.[6](https://arxiv.org/html/2606.06647#S4.F6)\)\. EEGMAT yields group\-level mean pairwise cosinec¯∈\[\+0\.07,\+0\.15\]\\bar\{c\}\\in\[\+0\.07,\+0\.15\]across three FMs \(subjects share a contrast axis above the isotropic null\)\. SleepDep yieldsc¯≈0\\bar\{c\}\\approx 0\(subjects do not share a contrast axis\); cosine PERMANOVA \(Sec\.[3\.7\.5](https://arxiv.org/html/2606.06647#S3.SS7.SSS5); App\.[C](https://arxiv.org/html/2606.06647#A3), Tab\.[A3](https://arxiv.org/html/2606.06647#A3.T3)\) still detects label\-associated structure on every \(FM, state\) pair \(p≤0\.001p\\leq 0\.001\)\. A local per\-subject contrast exists but does not generalize across subjects \(per\-cell assessment in Tab\.[4](https://arxiv.org/html/2606.06647#S5.T4)\)\. We read these four outcomes as different sides of a single representation bottleneck \(Sec\.[5](https://arxiv.org/html/2606.06647#S5)\)\.

## 5Discussion

The empirical core of this paper is the Identity Trap: across four small\-NNclinical resting\-state cohorts, subject identity dominates the frozen features of all three EEG\-FMs we test, and standard fine\-tuning amplifies this subject geometry in 12 of 12 \(cell, FM\) pairs while amplifying label geometry only in cells where prior literature has already established a cross\-subject EEG marker\. On two of the three FMs, the aperiodic1/f1/fbackground is one identifiable carrier of this subject axis\. The four cells map onto four distinct outcomes \(Tab\.[4](https://arxiv.org/html/2606.06647#S5.T4)\), but they share a single representation bottleneck rather than acting as four unrelated cases\.

Table 4:Cross\-diagnostic results and per\-cell assessment\.Numeric summary ofFMScope’s four diagnostics per cell\. All values are the arithmetic mean across the three foundation models \(LaBraM, CBraMod, REVE\); per\-FM values appear in Tab\.[A1](https://arxiv.org/html/2606.06647#A1.T1)\(Δflabel\\Delta f\_\{\\mathrm\{label\}\}\), Fig\.[4](https://arxiv.org/html/2606.06647#S4.F4)\(layer probe\), Fig\.[6](https://arxiv.org/html/2606.06647#S4.F6)\(c¯\\bar\{c\}\), and Tab\.[A4](https://arxiv.org/html/2606.06647#A4.T4)\(1/f1/fablation drops\)\. Columns:Δflabel\\Delta f\_\{\\mathrm\{label\}\}= FT minus frozen label\-explained variance fraction;*layer probe BA max / last*= max and last\-layer balanced accuracy of the layer\-wise label probe;c¯\\bar\{c\}= median within\-subject direction\-consistency \(“—” in trait cells where no within\-subject paired contrast exists\);*1/f1/fdrops state / subj*= drops in label\-state and subject\-identity probe BAs after aperiodic\-component ablation \(positive values mean the ablated component carried that signal\)\.*Outcome*is the qualitative cell\-level reading \(Sec\.[4\.4](https://arxiv.org/html/2606.06647#S4.SS4)\); the four outcomes are exhaustive over the 2×\\times2 cell layout: W = within\-subject paired, T = subject\-label trait, C = consensus cross\-subject marker, N = no consensus marker\.### 5\.1The representation bottleneck

FM features carry stable subject geometry more strongly than task\-aligned variance, and fine\-tuning reinforces this ordering rather than correcting it\. The frozen\-state evidence has three parts\. The subject\-variance fraction is 13–89×\\timesa matched random\-Gaussian null in 12 of 12 frozen pairs \(Tab\.[A2](https://arxiv.org/html/2606.06647#A2.T2)\); the layer\-wise probe shows that the subject axis becomes linearly decodable within the first two to three transformer blocks rather than at the embedding output, so the encoder actively amplifies subject identity into a separable direction; and closed\-form subject\-axis erasure confirms this dominance sits on a removable linear axis, collapsing the linear subject probe to chance in every pair \(Tab\.[3](https://arxiv.org/html/2606.06647#S4.T3)\)\. On LaBraM and CBraMod, the aperiodic1/f1/fbackground is one identifiable carrier: removing it drops the linear subject probe by99to1919pp uniformly across all four cells\. REVE shows no measurable aperiodic dependence; the design\-axis differences between REVE and the LaBraM\-plus\-CBraMod group are listed in Sec\.[5\.5](https://arxiv.org/html/2606.06647#S5.SS5)\.

Fine\-tuning then amplifies whatever subject axis it inherits\.Δfsubj\\Delta f\_\{\\mathrm\{subj\}\}is positive in every pair \(\+10\+10to\+63\+63pp\), whereasΔflabel\\Delta f\_\{\\mathrm\{label\}\}is positive in 6 of 6 consensus\-marker pairs \(\+0\.6\+0\.6to\+8\.4\+8\.4pp\) and spans zero in the 6 no\-consensus pairs\. When no literature\-supported cross\-subject marker is available for FT to amplify toward, FT still adapts, but what it adapts toward is the subject axis\. This connects to a recurring caution in the EEG\-FM literature\. Subject\-disjoint critiques\(Brookshire and others,[2024](https://arxiv.org/html/2606.06647#bib.bib15)\)establish that trial\-level splits inflate accuracy via subject identity\. Our diagnostic addresses the next concern: even under subject\-disjoint splitting, subject\-correlated features can still survive the split\. The bottleneck here is not opaque feature noise; it is the stable presence of a single identifiable component in the input signal\. Prior work identifies subject\-disjoint splitting as a necessary protocol; we show that this protocol alone is not sufficient for clinical discovery if the model instead aligns its features with stable subject traits\.FMScopeprovides a representation\-level account of this persistent failure\.

### 5\.2Spectral\-bias amplification under random\-head fine\-tuning

We interpret the cell\-conditional amplification asymmetry through a gradient\-geometry reading, now supported on LaBraM and CBraMod by the FT row of Fig\.[5](https://arxiv.org/html/2606.06647#S4.F5)b\. Fine\-tuning concentrates supervised signal along whichever direction already carries the most variance in the pretrained representation, regardless of whether that direction aligns with the supervised objective\. Across 12 \(cell, FM\) pairs,Δfsubj\\Delta f\_\{\\mathrm\{subj\}\}is positive in every case \(\+10\+10to\+63\+63pp\), and the largest amplifications fall in the no\-consensus trait cell where Stress reaches\+32\+32to\+63\+63pp on the three FMs\.Δflabel\\Delta f\_\{\\mathrm\{label\}\}is positive in 6 of 6 consensus\-marker pairs \(\+0\.6\+0\.6to\+8\.4\+8\.4pp\) and spans zero in the 6 no\-consensus pairs \(−1\.9\-1\.9to\+0\.8\+0\.8pp\)\. This asymmetry is what we expect if FT acts on existing axes: subject\-axis amplification is universal because that axis is large in every frozen representation, whereas label\-axis amplification depends on whether a label\-aligned axis is already large enough to be amplified at all\. Under a randomly initialized classification head, this gradient\-geometry reading has formal anchors: linear probing then fine\-tuning analyses show that LP induces a linear\-head\-norm regime that preserves the backbone’s pretrained feature directions during the subsequent FT stage\(Tomihari and Sato,[2024](https://arxiv.org/html/2606.06647#bib.bib42)\), and parameter\-efficient adaptation directions that capture the most activation variance provably maximize the expected gradient signal\(Paischeret al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib41)\)\. These results combine with classical linear\-network results showing that gradient descent preferentially learns the largest\-singular\-value directions of the input–output correlation\(Saxeet al\.,[2019](https://arxiv.org/html/2606.06647#bib.bib36)\)\. The prescription that follows from this combination is layer\-localized fine\-tuning keyed to shift type\(Leeet al\.,[2023](https://arxiv.org/html/2606.06647#bib.bib43)\)\. Our variance fractions correspond to the marginalization step of demixed PCA\(Kobaket al\.,[2016](https://arxiv.org/html/2606.06647#bib.bib40)\), the established systems\-neuroscience predecessor of factor\-wise variance accounting\.

### 5\.3A coding pattern shared across modalities

Our finding sits between two adjacent observations, neuroscience\-side fingerprinting and ML\-side shortcut learning, which converge on the same operational competition between stable biometric structure and unstable task signal\. Our diagnostic documents the Identity Trap as a specific dissociation: subject variance dominates the frozen features, label variance is subordinate, and on LaBraM and CBraMod the aperiodic1/f1/fbackground is one identifiable subject carrier\. The same dissociation between stable subject structure and task signal is documented across resting\-state fMRI fingerprinting and EEG biometrics \(Sec\.[2](https://arxiv.org/html/2606.06647#S2)\); we show it holds inside EEG\-FM representations as well, using variance decomposition as the test\. The broader observation that deep networks exploit dataset\-stable shortcut features over invariant discriminative cues\(Geirhoset al\.,[2020](https://arxiv.org/html/2606.06647#bib.bib35)\)supplies the cross\-architecture frame\. Our diagnostic in turn provides a physically\-grounded instance of that phenomenon in biosignal foundation models: the shortcut here is not a pure statistical artifact of the training corpus but has a measurable physiological component of the input \(the aperiodic1/f1/fbackground\), high\-energy, cross\-session stable, and easier to reconstruct than the clinical signal the model is nominally trained to support\. Concurrent prescriptive work on cross\-subject EEG/MEG\-FM adaptation includes SuLoRA, which decomposes each weight matrix into a shared component and a per\-subject low\-rank correction\(Kleinet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib46)\), and SCOPE, a prototype\-guided adaptation framework for label\-limited EFM fine\-tuning\(Maet al\.,[2026](https://arxiv.org/html/2606.06647#bib.bib47)\)\. Both prescribe a fix \(adding subject\-specific parameters or prototypes\) without measuring the subject axis they adapt to, and neither relates it to a physiological component of the signal\. Our diagnostic is complementary and upstream: it quantifies the axis in the frozen feature space, shows it is removable in closed form, and identifies one physiological carrier\. Both adaptation methods are evaluated on within\-subject task contrasts or consensus\-marker datasets; whether they extend to the no\-consensus column, which we predict to be intractable, remains open\.

### 5\.4Implications andFMScopeuse

We draw three deployment\-side implications from the assessment matrix\. The pre\-pilot frozen\-feature pass is a cheap filter: when the assessment falls below linear\-probe resolution \(Stress on all three FMs\), the bottleneck lies in the recording substrate itself \(an information deficit\) rather than in model capacity\. Scaling the pretraining corpus or model size does not address this class of failure; the remedy lies in how the data are collected, not in the model\. Trait\-cell labels collected within a single session conflate disease state with stable subject features \(aperiodic1/f1/fbeing one identifiable carrier on LaBraM and CBraMod\)\. To separate the two, the label must change within a subject\. Only then does its discriminative axis fall outside the between\-subject subspace that LEACE erases\. Some labels allow this: when the underlying state changes over time, recording each subject under both conditions captures the variation\. Other labels do not: a dementia diagnosis is a fixed subject attribute, so no recording protocol can separate it from identity\. For these labels the entanglement is irreducible at the representation level, and the discriminative features must instead be validated as physiological rather than biometric\. Scaling model capacity helps in neither case\. Brain–computer\-interface calibration claims should pass a within\-subject direction\-consistency check before any subject\-independent generalization is asserted; a cell like SleepDep, where contrasts are idiosyncratic within each subject, can support high in\-subject calibration accuracy with near\-zero cross\-subject transfer\. These assessments also depend on reading the diagnostics jointly: the SleepDep assessment is recoverable only from the disagreement pattern across tools, not from any single one\. Building on the established subject\-disjoint splitting culture in EEG\(Brookshire and others,[2024](https://arxiv.org/html/2606.06647#bib.bib15); Lotteet al\.,[2018](https://arxiv.org/html/2606.06647#bib.bib45)\), the assessment matrix turns this into a frozen\-feature pre\-pilot test: whether the features show the geometric signatures \(subject dominance, aperiodic anchoring, axis disagreement\) that predict cross\-subject failure before any fine\-tuning compute is committed\.

### 5\.5Limitations and scope

We note five limitations to the present claims\.*\(i\)*Our2×22\{\\times\}2layout relies on one dataset per cell; consequently, although theFMScopeframework is generalizable, the specific cell\-level clinical observations remain empirical features of these four cohorts and await broader validation\.*\(ii\)*All four cells are small by machine\-learning standards \(StressN=17N=17subjects; EEGMAT and SleepDepN=36N=36; ADFTDN=65N=65\); we mitigate seed sensitivity with 3\-seed FT and 8\-seed LP, but per\-cell confidence intervals are wide and pointwise comparisons across cells are descriptive\. The Mann–Whitney U is computed across 12 \(cell×\\timesFM\) points drawn from four independent cells; the column claim rests on the non\-overlapping effect\-size ranges \(\+0\.6\+0\.6to\+8\.4\+8\.4pp vs−1\.9\-1\.9to\+0\.8\+0\.8pp\), which the test summarizes\.*\(iii\)*Our ablations are input\-level interventions; they establish statistical entanglement between aperiodic1/f1/fand subject identity, not a causal mechanism\. The aperiodic\-anchor effect holds on LaBraM and CBraMod but not on REVE, and these two groups differ along at least five concurrent design axes that we cannot factorially separate in an N=3 model panel: \(a\) the presence of spectral processing in the pretraining pipeline \(LaBraM’s encoder targets discrete codes whose decoder reconstructs Fourier amplitude\+phase; CBraMod’s patch embedding adds an FFT\-derived branch to a temporal CNN; REVE operates on raw signal throughout\); \(b\) reconstruction loss \(CBraMod’s MSE versus REVE’s explicitly motivated L1, chosen byEl Ouahidiet al\.\([2025](https://arxiv.org/html/2606.06647#bib.bib3)\)to avoid L2 amplification of high\-amplitude EEG content\); \(c\) pretraining corpus diversity \(REVE’s 25,000 subjects across 92 heterogeneous datasets versus CBraMod’s∼15,000\\sim\\\!15\{,\}000subjects from a single clinical source\); \(d\) positional encoding \(REVE’s 4D Fourier sinusoidal versus learned or convolutional alternatives\); \(e\) an attention\-pooling secondary task that REVE adds and the other two omit\. We therefore report the two\-versus\-one pattern descriptively, not as a mechanism claim; an architecture\-level controlled ablation would be required to isolate any single axis\.*\(iv\)*All analyses assume subject\-disjoint cross\-validation, recording\-level labels, and a≥19\\geq 19\-channel 10–20 montage; cells with finer label resolution or trial\-level splitting may produce different readings\. The DASS\-21 self\-report on Stress\(Lovibond and Lovibond,[1995](https://arxiv.org/html/2606.06647#bib.bib14)\)is a past\-week screener, and we adopt the per\-recording binarization ofKomarovet al\.\([2020](https://arxiv.org/html/2606.06647#bib.bib5)\)for cross\-protocol comparability withWanget al\.\([2025](https://arxiv.org/html/2606.06647#bib.bib4)\); our subject\-disjoint numbers reproduce that work’s protocol\-side configuration but apply each FM’s release\-default fine\-tuning recipe rather than a per\-dataset hyperparameter sweep\.*\(v\)*Single\-session ADFTD cannot separate disease state from stable subject traits by data alone\. The diagnosis is a fixed subject attribute and does not change within a subject, so collecting more sessions would not separate them either\. For such labels the limit is intrinsic, and the discriminative features must be validated against external physiological evidence rather than addressed with more recordings\. We did not pre\-align signals using Riemannian or Euclidean alignment methods before fine\-tuning\. This engineering step may attenuate the subject\-axis amplification and is a natural follow\-up, but it is not a prerequisite for the diagnostic claim itself\.

### 5\.6Future directions

We see three algorithmic directions following from the bottleneck reading: FOOOF\-detrended pretraining \(the causal test of whether aperiodic1/f1/fis the masked objective’s primary subject carrier\), explicit gradient\-reversal objectives at fine\-tuning\(Ganin and Lempitsky,[2015](https://arxiv.org/html/2606.06647#bib.bib11)\)that could augment existing general\-purpose SSL pretraining recipes for EEG\(Banvilleet al\.,[2021](https://arxiv.org/html/2606.06647#bib.bib38); Kostaset al\.,[2021](https://arxiv.org/html/2606.06647#bib.bib39)\), and explicit cross\-subject alignment regularization at pretraining\. On the empirical side, replication with a second dataset per cell, a layer\-wise FOOOF\-aperiodic probe, a broader FM panel that varies the five design axes independently, and a within\-subject paired protocol on a no\-consensus label would sharpen the present descriptive observations into axis\-level claims\. IntegratingFMScope\-style diagnostics into large\-scale community benchmarks\(Aristimunhaet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib18); Kastratiet al\.,[2025](https://arxiv.org/html/2606.06647#bib.bib16); Shenet al\.,[2026](https://arxiv.org/html/2606.06647#bib.bib7)\)would complement leaderboard accuracy with a representation\-level check that distinguishes models rewarded for learning clinical biomarkers from those rewarded for optimizing subject\-specific aperiodic anchors\. The diagnostic framework we report here characterizes the substrate; pretraining\-objective work is the path that moves the substrate\.

## 6Conclusion

Subject\-disjoint cross\-validation is necessary but not sufficient for representation discovery in small\-NNclinical resting\-state EEG\. Across the 12 \(cell, FM\) pairs we examined, the frozen embedding already placed subject identity on its largest\-variance direction, and fine\-tuning amplified that direction whether or not the label aligned with it; a high accuracy is therefore consistent either with a transferable clinical biomarker or with subject identity that merely co\-varies with the label in this cohort, and accuracy alone cannot separate the two\.FMScopeis a frozen\-feature pre\-flight check, complementary to subject\-disjoint splitting, that tells the user before any fine\-tuning whether a cell’s representation carries label\-aligned biological structure or subject identity dressed up as a label\. Where it signals an information deficit rather than a capacity limit, the remedy is methodological: a within\-subject label should be recorded under both conditions so that its discriminative axis separates from between\-subject identity, whereas a label fixed at the subject level cannot be disentangled by any protocol, and its discriminative features must instead be shown to correspond to an independently established physiological marker rather than to incidental subject traits\. The framework generalizes; the per\-cell outcomes are empirical features of these four cohorts and three FMs, and broader validation awaits future cohorts\.

## Data availability

This study uses four previously collected EEG datasets\. EEGMAT\(Zymaet al\.,[2019](https://arxiv.org/html/2606.06647#bib.bib24)\)is available from PhysioNet \([https://physionet\.org/content/eegmat/1\.0\.0/](https://physionet.org/content/eegmat/1.0.0/)\); the ADFTD dataset\(Miltiadous and others,[2023](https://arxiv.org/html/2606.06647#bib.bib6)\)is available from OpenNeuro \(ds004504\); the sleep\-deprivation rsEEG dataset\(Xianget al\.,[2024](https://arxiv.org/html/2606.06647#bib.bib28)\)is available as published by the original authors\. The resting\-state stress dataset\(Komarovet al\.,[2020](https://arxiv.org/html/2606.06647#bib.bib5)\)is not currently deposited in a public repository; it was shared by the originating laboratory \(co\-author T\.\-P\. Jung’s group\) for the present secondary analysis and is available from the authors upon reasonable request and with permission from the original investigators\. No new data were collected\.

## Code availability

TheFMScopetoolkit, comprising the five frozen\-representation diagnostics used in this paper \(variance decomposition, subject\-axis erasure, aperiodic FOOOF ablation, layer\-wise probe, and within\-subject direction consistency\), together with the exact scripts that reproduce every table and figure, is available at[https://github\.com/Jimmy110101013/fmscope](https://github.com/Jimmy110101013/fmscope)\. Reproduction runs from bundled aggregate results and frozen\-feature caches, without raw EEG or model weights\. Pretrained foundation\-model weights \(LaBraM, CBraMod, REVE\) are obtained from the respective original repositories and are not redistributed by this work\.

## Ethics statement

This work uses previously collected, de\-identified EEG datasets\. EEGMAT, ADFTD, and the sleep\-deprivation dataset are publicly released; the Komarov et al\. stress dataset was obtained from the original investigators \(co\-author T\.\-P\. Jung’s lab\) for secondary analysis\. The original data\-collection studies were conducted under their respective IRB approvals and consent procedures, which we cite\. No new human\-subject data were collected, and no additional IRB approval was required\.

## Conflict of interest

The authors declare no competing interests\.

## Funding

This work received no external funding\.

## Author contributions

J\.\-Y\.L\.: conceptualization, methodology, software, formal analysis, investigation, data curation, writing — original draft, writing — review and editing, visualization\. T\.\-P\.J\.: supervision, conceptual input, writing — review and editing\.

## Appendix

## Appendix AVariance decomposition: per\-pair values

Window\-level variance decomposition, full per\-\(cell, FM\) values for the 12 frozen and 12 fine\-tuned pairs underlying Sec\.[4\.2](https://arxiv.org/html/2606.06647#S4.SS2)and Fig\.[3](https://arxiv.org/html/2606.06647#S4.F3)\.Δ\\Deltacolumns are FT minus frozen\.

Table A1:Per\-pair label\-fraction and subject\-fraction values \(window\-level variance decomposition\)\. Frozen and FT in percentage points;Δ\\Delta= FT−\-frozen\.TheΔflabel\\Delta f\_\{\\mathrm\{label\}\}column\-effect Mann–Whitney \(method: Sec\.[3\.7\.1](https://arxiv.org/html/2606.06647#S3.SS7.SSS1)\) operates on the 12Δ\\Deltalabel values above\. Cosine PERMANOVA per\-cell assessments are reported in App\.[C](https://arxiv.org/html/2606.06647#A3)\.

## Appendix BRandom\-Gaussian null control for the Identity\-Trap inequality

The raw inequalityfsubj\>flabelf\_\{\\mathrm\{subj\}\}\>f\_\{\\mathrm\{label\}\}on a crossed sum\-of\-squares decomposition is combinatorially guaranteed whenever the number of subjectsSSexceeds the number of labelsLL: for an iid Gaussian embedding withNNrows the marginal sum\-of\-squares fractions converge to their degrees of freedom,E\[fsubj\]≈\(S−1\)/\(N−1\)\\mathrm\{E\}\[f\_\{\\mathrm\{subj\}\}\]\\approx\(S\{\-\}1\)/\(N\{\-\}1\)andE\[flabel\]≈\(L−1\)/\(N−1\)\\mathrm\{E\}\[f\_\{\\mathrm\{label\}\}\]\\approx\(L\{\-\}1\)/\(N\{\-\}1\), withS\>LS\>Lfor every cell in our panel\. We therefore quantify the Identity\-Trap evidence as the*excess of the frozenfsubjf\_\{\\mathrm\{subj\}\}over a matched random\-Gaussian null*of identical shape\(N,D\)\(N,D\), averaged overK=20K=20seeds\. Tab\.[A2](https://arxiv.org/html/2606.06647#A2.T2)reports per\-pair excess ratios; the empirical null mean agrees with the closed\-form\(S−1\)/\(N−1\)\(S\{\-\}1\)/\(N\{\-\}1\)prediction to three decimal places\.

Table A2:Frozen subject\-variance fraction \(real FM vs\. matched random\-Gaussian null of identical shape\) for the 12 \(cell, FM\) pairs\.NNis the number of windows,SSthe number of subjects,L=2L=2for every cell\. “Nullfsubjf\_\{\\mathrm\{subj\}\}” is the mean±\\pmSD overK=20K=20iid Gaussian draws; “df pred” is the closed\-form\(S−1\)/\(N−1\)\(S\{\-\}1\)/\(N\{\-\}1\); “Excess×\\times” is real / null mean\.Δflabel\\Delta f\_\{\\mathrm\{label\}\}\(Tab\.[A1](https://arxiv.org/html/2606.06647#A1.T1)\) is intrinsically null\-corrected because the combinatorial offset cancels in the frozen\-to\-FT difference\.
## Appendix CCosine PERMANOVA: per\-cell label\-structure detection

Per\-cell PERMANOVApp\-values on the variance\-triangulation cache \(method: Sec\.[3\.7\.5](https://arxiv.org/html/2606.06647#S3.SS7.SSS5)\)\.

Table A3:Cosine PERMANOVAplabelp\_\{\\text\{label\}\}per cell×\\timesFM×\\timesstate\. Floor≈0\.001\\approx 0\.001\(999999permutations\)\. EEGMAT, ADFTD, and SleepDep clear the floor on every \(FM, state\) cell; Stress is null on every cell\.PERMANOVA on SleepDep is significant on every \(FM, state\) pair even thoughΔflabel\\Delta f\_\{\\mathrm\{label\}\}is negative across all three FMs\. This is consistent with the within\-subject paired design: each pair block is anchored to one subject, and the per\-pair contrast leaves a label\-detectable signature in the cosine geometry whether or not fine\-tuning amplifies it\. We therefore use PERMANOVA as a within\-cell detector of label\-related structure rather than as the axis\-level test; the consensus\-vs\-no\-consensus contrast is given by theΔflabel\\Delta f\_\{\\mathrm\{label\}\}Mann–Whitney U above\. On Stress, PERMANOVA shows no effect on any \(FM, state\) pair, which supports the below\-linear\-probe\-resolution assessment \(Tab\.[4](https://arxiv.org/html/2606.06647#S5.T4)\) without relying on the layer\-wise probe alone\.

## Appendix DFOOOF aperiodic / periodic ablation: full probe BA table

Label and subject probe baseline BAs and ablation deltas for the 4 cells×\\times3 FMs underlying Sec\.[4\.3](https://arxiv.org/html/2606.06647#S4.SS3)and Fig\.[5](https://arxiv.org/html/2606.06647#S4.F5)\. The subject probe is a 5\-fold temporal\-block LDA with Ledoit–Wolf shrinkage\. All values in BA×100\\times 100;Δ\\Deltavalues in percentage points\.

Table A4:Label probe and subject probe baseline BA, with deltas under FOOOF−\-aperiodic and−\-periodic\. Label probe: subject\-disjoint 5\-fold logistic regression on per\-window features\. Subject probe: 5\-fold temporal\-block LDA\.Δ\\Deltavalues are ablation BA minus baseline BA\.The−\-aperiodic intervention drops the subject probe by99–1919pp on LaBraM and CBraMod uniformly across all four cells, while leaving REVE’s already\-saturated subject probe \(baseline≥0\.93\\geq 0\.93\) unchanged\. Periodic peak removal shifts either probe by≤0\.8\\leq 0\.8pp on every \(FM, cell\) pair\. Cell\-conditional label\-probe response is interpreted in Sec\.[4\.3](https://arxiv.org/html/2606.06647#S4.SS3)\.

## Appendix ESubject\-axis erasure across independent cohorts

The single EEGMAT erasure demonstration generalises\. Applying the identical procedure \(freeze, LEACE\-erase the subject subspace, label BA pre/post under subject\-level CV\) to four further marker families and the five external audit cohorts \(Tab\.[A5](https://arxiv.org/html/2606.06647#A5.T5)\),Δerase\\Delta\_\{\\mathrm\{erase\}\}is positive on all three FMs wherever a literature\-established consensus marker exists — arithmeticθ\\theta, motor\-imagery ERD, auditory P300 — and mixed\-to\-negative for the two external cohorts whose markers are not established \(SAM40, TDBRAIN\-state\); the three subject\-trait cohorts haveΔerase\\Delta\_\{\\mathrm\{erase\}\}undefined but show the same subject probe collapse to chance\. Magnitudes are not comparable across rows \(windows11–55s, ceiling effects, recording\- vs window\-level scoring\), but the signs are\. Across the twelve consensus\-marker cells \(four cohorts×\\timesthree FMs, the primary EEGMAT cell included\)Δerase\\Delta\_\{\\mathrm\{erase\}\}is positive in all twelve \(one\-sided sign testp=2\.4×10−4p=2\.4\\times 10^\{\-4\}; consensus vs\. non\-consensus cohorts, one\-sided Mann–WhitneyUU,p=2\.2×10−4p=2\.2\\times 10^\{\-4\}\)\. Erasing identity thus aids the label exactly where a cross\-subject marker exists, reinforcing Sec\.[4\.2](https://arxiv.org/html/2606.06647#S4.SS2)\(Tab\.[3](https://arxiv.org/html/2606.06647#S4.T3)\)\.

Table A5:Subject\-axis erasure beyond the EEGMAT demonstration \(Tab\.[3](https://arxiv.org/html/2606.06647#S4.T3)\)\. Raw BA: pre\-erasure label BA×100\\times 100\(range over the three FMs\)\.Δerase\\Delta\_\{\\mathrm\{erase\}\}: post−\-pre change \(pp, 3\-seed mean\)\. Top: established consensus markers; bottom: external cohorts with non\-established markers\.
## References

- M\. J\. Anderson \(2001\)A new method for non\-parametric multivariate analysis of variance\.Austral Ecology26\(1\),pp\. 32–46\.Cited by:[§3\.7\.5](https://arxiv.org/html/2606.06647#S3.SS7.SSS5.Px1.p1.5)\.
- B\. Aristimunha, D\. Truong, P\. Guetschel, S\. Y\. Shirazi, I\. Guyon, A\. R\. Franco, M\. P\. Milham, A\. Dotan, S\. Makeig, A\. Gramfort, J\. King, M\. Corsi, P\. A\. Valdés\-Sosa, A\. Majumdar, A\. Evans, T\. J\. Sejnowski, O\. Shriki, S\. Chevallier, and A\. Delorme \(2025\)EEG foundation challenge: from cross\-task to cross\-subject EEG decoding\.arXiv preprint arXiv:2506\.19141\.Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px6.p1.1),[§5\.6](https://arxiv.org/html/2606.06647#S5.SS6.p1.1)\.
- C\. Babiloni, X\. Arakaki, H\. Azami,et al\.\(2021\)Measures of resting state EEG rhythms for clinical trials in Alzheimer’s disease: recommendations of an expert panel\.Alzheimer’s & Dementia17\(9\),pp\. 1528–1553\.External Links:[Document](https://dx.doi.org/10.1002/alz.12311)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1),[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5),[§3\.2](https://arxiv.org/html/2606.06647#S3.SS2.p2.6),[Table 1](https://arxiv.org/html/2606.06647#S3.T1.5.9.6.2)\.
- H\. Banville, O\. Chehab, A\. Hyvärinen, D\. Engemann, and A\. Gramfort \(2021\)Uncovering the structure of clinical EEG signals with self\-supervised learning\.Journal of Neural Engineering18\(4\),pp\. 046020\.External Links:[Document](https://dx.doi.org/10.1088/1741-2552/abca18)Cited by:[§5\.6](https://arxiv.org/html/2606.06647#S5.SS6.p1.1)\.
- N\. Belrose, D\. Schneider\-Joseph, S\. Ravfogel, R\. Cotterell, E\. Raff, and S\. Biderman \(2023\)LEACE: perfect linear concept erasure in closed form\.InAdvances in Neural Information Processing Systems \(NeurIPS\),Cited by:[§3\.7\.2](https://arxiv.org/html/2606.06647#S3.SS7.SSS2.p1.11)\.
- G\. Brookshireet al\.\(2024\)Data leakage in deep learning studies of translational EEG\.Frontiers in Neuroscience\.External Links:[Document](https://dx.doi.org/10.3389/fnins.2024.1373515)Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p2.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px2.p1.1),[§3\.4](https://arxiv.org/html/2606.06647#S3.SS4.SSS0.Px1.p1.3),[§5\.1](https://arxiv.org/html/2606.06647#S5.SS1.p2.6),[§5\.4](https://arxiv.org/html/2606.06647#S5.SS4.p1.1)\.
- P\. Campisi and D\. La Rocca \(2014\)Brain waves for automatic biometric\-based user recognition\.IEEE Trans\. Information Forensics and Security9\(5\),pp\. 782–800\.External Links:[Document](https://dx.doi.org/10.1109/TIFS.2014.2308640)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px4.p1.2)\.
- R\. Ceponiene, M\. Westerfield, M\. Torki, and J\. Townsend \(2008\)Modality\-specificity of sensory aging in vision and audition: evidence from event\-related potentials\.Brain Research1215,pp\. 53–68\.External Links:[Document](https://dx.doi.org/10.1016/j.brainres.2008.02.010)Cited by:[Table A5](https://arxiv.org/html/2606.06647#A5.T5.19.13.4)\.
- M\. Demuru and M\. Fraschini \(2020\)EEG fingerprinting: subject\-specific signature based on the aperiodic component of power spectrum\.Computers in Biology and Medicine120,pp\. 103748\.External Links:[Document](https://dx.doi.org/10.1016/j.compbiomed.2020.103748)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px4.p1.2)\.
- T\. Donoghue, M\. Haller, E\. J\. Peterson, P\. Varma, P\. Sebastian, R\. Gao, T\. Noto, A\. H\. Lara, J\. D\. Wallis, R\. T\. Knight, A\. Shestyuk, and B\. Voytek \(2020\)Parameterizing neural power spectra into periodic and aperiodic components\.Nature Neuroscience23,pp\. 1655–1665\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p3.4),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px4.p1.2),[§3\.7\.3](https://arxiv.org/html/2606.06647#S3.SS7.SSS3.p2.2)\.
- Y\. El Ouahidi, J\. Lys, P\. Thölke, N\. Farrugia, B\. Pasdeloup, V\. Gripon, K\. Jerbi, and G\. Lioi \(2025\)REVE: a foundation model for EEG – adapting to any setup with large\-scale pretraining on 25,000 subjects\.arXiv preprint arXiv:2510\.21585\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px1.p1.7),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px5.p1.2),[§3\.3](https://arxiv.org/html/2606.06647#S3.SS3.p1.3),[§5\.5](https://arxiv.org/html/2606.06647#S5.SS5.p1.12)\.
- C\. A\. Field and A\. H\. Welsh \(2007\)Bootstrapping clustered data\.Journal of the Royal Statistical Society: Series B69\(3\),pp\. 369–390\.Cited by:[§3\.7\.1](https://arxiv.org/html/2606.06647#S3.SS7.SSS1.p1.17)\.
- E\. S\. Finnet al\.\(2015\)Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity\.Nature Neuroscience\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p2.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px4.p1.2)\.
- Y\. Ganin and V\. Lempitsky \(2015\)Unsupervised domain adaptation by backpropagation\.InICML,Cited by:[§5\.6](https://arxiv.org/html/2606.06647#S5.SS6.p1.1)\.
- R\. Gao, E\. J\. Peterson, and B\. Voytek \(2017\)Inferring synaptic excitation/inhibition balance from field potentials\.NeuroImage158,pp\. 70–78\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p3.4)\.
- R\. Geirhos, J\. Jacobsen, C\. Michaelis, R\. Zemel, W\. Brendel, M\. Bethge, and F\. A\. Wichmann \(2020\)Shortcut learning in deep neural networks\.Nature Machine Intelligence2\(11\),pp\. 665–673\.External Links:[Document](https://dx.doi.org/10.1038/s42256-020-00257-z)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px5.p1.2),[§5\.3](https://arxiv.org/html/2606.06647#S5.SS3.p1.2)\.
- A\. Gevins, M\. E\. Smith, L\. McEvoy, and D\. Yu \(1997\)High\-resolution EEG mapping of cortical activation related to working memory: effects of task difficulty, type of processing, and practice\.Cerebral Cortex7\(4\),pp\. 374–385\.External Links:[Document](https://dx.doi.org/10.1093/cercor/7.4.374)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1),[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5),[Table 1](https://arxiv.org/html/2606.06647#S3.T1.5.6.3.2)\.
- R\. Ghosh, N\. Deb, K\. Sengupta, A\. Phukan, N\. Choudhury, S\. Kashyap, S\. Phadikar, R\. Saha, P\. Das, N\. Sinha, and P\. Dutta \(2022\)SAM 40: dataset of 40 subject EEG recordings to monitor the induced\-stress while performing stroop color\-word test, arithmetic task, and mirror image recognition task\.Data in Brief40,pp\. 107772\.External Links:[Document](https://dx.doi.org/10.1016/j.dib.2021.107772)Cited by:[Table A5](https://arxiv.org/html/2606.06647#A5.T5.22.16.4)\.
- W\. Jiang, L\. Zhao, and B\. Lu \(2024\)LaBraM: large brain model for learning generic representations with tremendous EEG data in BCI\.InInternational Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px1.p1.7),[§3\.3](https://arxiv.org/html/2606.06647#S3.SS3.p1.3),[§3\.5](https://arxiv.org/html/2606.06647#S3.SS5.p1.1)\.
- A\. Kastrati, J\. Bürki, J\. Lauer, C\. Xuan, R\. Iaquinto, and R\. Wattenhofer \(2025\)EEG\-Bench: a benchmark for EEG foundation models in clinical applications\.InNeurIPS,Note:arXiv:2512\.08959Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§1](https://arxiv.org/html/2606.06647#S1.p2.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px2.p1.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px6.p1.1),[§5\.6](https://arxiv.org/html/2606.06647#S5.SS6.p1.1)\.
- T\. Klein, P\. Minakowski, S\. Sager, and S\. Schotthöfer \(2025\)Mitigating subject dependency in EEG decoding with subject\-specific low\-rank adapters\.arXiv preprint arXiv:2510\.08059\.Cited by:[§5\.3](https://arxiv.org/html/2606.06647#S5.SS3.p1.2)\.
- W\. Klimesch \(1999\)EEG alpha and theta oscillations reflect cognitive and memory performance: a review and analysis\.Brain Research Reviews29\(2\-3\),pp\. 169–195\.Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1),[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5),[Table 1](https://arxiv.org/html/2606.06647#S3.T1.5.6.3.2)\.
- W\. Klimesch \(2012\)Alpha\-band oscillations, attention, and controlled access to stored information\.Trends in Cognitive Sciences16\(12\),pp\. 606–617\.Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1),[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5)\.
- D\. Kobak, W\. Brendel, C\. Constantinidis, C\. E\. Feierstein, A\. Kepecs, Z\. F\. Mainen, X\. Qi, R\. Romo, N\. Uchida, and C\. K\. Machens \(2016\)Demixed principal component analysis of neural population data\.eLife5,pp\. e10989\.External Links:[Document](https://dx.doi.org/10.7554/eLife.10989)Cited by:[§3\.7\.1](https://arxiv.org/html/2606.06647#S3.SS7.SSS1.p1.18),[§5\.2](https://arxiv.org/html/2606.06647#S5.SS2.p1.10)\.
- O\. Komarov, L\. Ko, and T\. Jung \(2020\)Associations among emotional state, sleep quality, and resting\-state EEG spectra: a longitudinal study in graduate students\.IEEE Transactions on Neural Systems and Rehabilitation Engineering28\(4\),pp\. 795–804\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§3\.2](https://arxiv.org/html/2606.06647#S3.SS2.p2.6),[§5\.5](https://arxiv.org/html/2606.06647#S5.SS5.p1.12),[Data availability](https://arxiv.org/html/2606.06647#Sx1.p1.1)\.
- M\. Kopčanová, L\. Tait, T\. Donoghue, G\. Stothart, L\. Smith, A\. A\. Flores\-Sandoval, P\. Davila\-Perez, S\. Buss, M\. M\. Shafi, A\. Pascual\-Leone, P\. J\. Fried, and C\. S\. Y\. Benwell \(2024\)Resting\-state EEG signatures of Alzheimer’s disease are driven by periodic but not aperiodic changes\.Neurobiology of Disease190,pp\. 106380\.External Links:[Document](https://dx.doi.org/10.1016/j.nbd.2023.106380)Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p3.4),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1)\.
- D\. Kostas, S\. Aroca\-Ouellette, and F\. Rudzicz \(2021\)BENDR: using transformers and a contrastive self\-supervised learning task to learn from massive amounts of EEG data\.Frontiers in Human Neuroscience15,pp\. 653659\.External Links:[Document](https://dx.doi.org/10.3389/fnhum.2021.653659)Cited by:[§5\.6](https://arxiv.org/html/2606.06647#S5.SS6.p1.1)\.
- G\. Kuruppu, N\. Wagh, V\. Kremen, S\. Pati, G\. Worrell, and Y\. Varatharajah \(2025\)EEG foundation models: a critical review of current progress and future directions\.arXiv preprint arXiv:2507\.11783\.Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px6.p1.1)\.
- V\. J\. Lawhern, A\. J\. Solon, N\. R\. Waytowich, S\. M\. Gordon, C\. P\. Hung, and B\. J\. Lance \(2018\)EEGNet: a compact convolutional neural network for EEG\-based brain\-computer interfaces\.Journal of Neural Engineering15\(5\),pp\. 056013\.Cited by:[§3\.6](https://arxiv.org/html/2606.06647#S3.SS6.SSS0.Px2.p1.1)\.
- Y\. Lee, A\. S\. Chen, F\. Tajwar, A\. Kumar, H\. Yao, P\. Liang, and C\. Finn \(2023\)Surgical fine\-tuning improves adaptation to distribution shifts\.InInternational Conference on Learning Representations,External Links:2210\.11466Cited by:[§5\.2](https://arxiv.org/html/2606.06647#S5.SS2.p1.10)\.
- F\. Lotte, L\. Bougrain, A\. Cichocki, M\. Clerc, M\. Congedo, A\. Rakotomamonjy, and F\. Yger \(2018\)A review of classification algorithms for EEG\-based brain–computer interfaces: a 10 year update\.Journal of Neural Engineering15\(3\),pp\. 031005\.External Links:[Document](https://dx.doi.org/10.1088/1741-2552/aab2f2)Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§5\.4](https://arxiv.org/html/2606.06647#S5.SS4.p1.1)\.
- S\. H\. Lovibond and P\. F\. Lovibond \(1995\)The structure of negative emotional states: comparison of the Depression Anxiety Stress Scales \(DASS\) with the Beck depression and anxiety inventories\.Behaviour Research and Therapy33\(3\),pp\. 335–343\.Cited by:[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5),[§5\.5](https://arxiv.org/html/2606.06647#S5.SS5.p1.12)\.
- J\. Ma, F\. Wu, Y\. Xing, Q\. Lin, T\. Liu, C\. Liu, Z\. Jia, and M\. Feng \(2026\)SCOPE: structured prototype\-guided adaptation for EEG foundation models with limited labels\.arXiv preprint arXiv:2602\.17251\.Cited by:[§5\.3](https://arxiv.org/html/2606.06647#S5.SS3.p1.2)\.
- M\. Mantwill, M\. Gell, S\. Krohn, and C\. Finke \(2022\)Brain connectivity fingerprinting and behavioural prediction rest on distinct functional systems of the human connectome\.Communications Biology5,pp\. 261\.External Links:[Document](https://dx.doi.org/10.1038/s42003-022-03185-3)Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p2.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px4.p1.2)\.
- S\. Marcel and J\. d\. R\. Millán \(2007\)Person authentication using brainwaves \(EEG\) and maximum a posteriori model adaptation\.IEEE Trans\. Pattern Anal\. Mach\. Intell\.\.Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px4.p1.2)\.
- A\. Miltiadouset al\.\(2023\)A dataset of scalp EEG recordings of Alzheimer’s disease, frontotemporal dementia and healthy subjects from routine EEG\.Data\.Cited by:[§3\.2](https://arxiv.org/html/2606.06647#S3.SS2.p2.6),[Data availability](https://arxiv.org/html/2606.06647#Sx1.p1.1)\.
- F\. Paischer, L\. Hauzenberger, T\. Schmied, B\. Alkin, M\. P\. Deisenroth, and S\. Hochreiter \(2024\)Parameter efficient fine\-tuning via explained variance adaptation\.InAdvances in Neural Information Processing Systems,External Links:2410\.07170Cited by:[§5\.2](https://arxiv.org/html/2606.06647#S5.SS2.p1.10)\.
- S\. J\. Reznik and J\. J\. B\. Allen \(2018\)Frontal asymmetry as a mediator and moderator of emotion: an updated review\.Psychophysiology55\(1\),pp\. e12965\.External Links:[Document](https://dx.doi.org/10.1111/psyp.12965)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1),[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5),[Table 1](https://arxiv.org/html/2606.06647#S3.T1.5.9.6.3)\.
- A\. M\. Saxe, J\. L\. McClelland, and S\. Ganguli \(2019\)A mathematical theory of semantic development in deep neural networks\.Proceedings of the National Academy of Sciences116\(23\),pp\. 11537–11546\.External Links:[Document](https://dx.doi.org/10.1073/pnas.1820226116)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px5.p1.2),[§5\.2](https://arxiv.org/html/2606.06647#S5.SS2.p1.10)\.
- G\. Schalk, D\. J\. McFarland, T\. Hinterberger, N\. Birbaumer, and J\. R\. Wolpaw \(2004\)BCI2000: a general\-purpose brain\-computer interface \(BCI\) system\.IEEE Transactions on Biomedical Engineering51\(6\),pp\. 1034–1043\.External Links:[Document](https://dx.doi.org/10.1109/TBME.2004.827072)Cited by:[Table A5](https://arxiv.org/html/2606.06647#A5.T5.16.10.4)\.
- R\. T\. Schirrmeister, J\. T\. Springenberg, L\. D\. J\. Fiederer, M\. Glasstetter, K\. Eggensperger, M\. Tangermann, F\. Hutter, W\. Burgard, and T\. Ball \(2017\)Deep learning with convolutional neural networks for EEG decoding and visualization\.Human Brain Mapping38\(11\),pp\. 5391–5420\.Cited by:[§3\.6](https://arxiv.org/html/2606.06647#S3.SS6.SSS0.Px2.p1.1)\.
- F\. Shen, E\. Yang, J\. Li, J\. Hong, X\. Pan, Z\. Yuan, M\. Li, and Y\. Yang \(2026\)Brain4FMs: a benchmark of foundation models for electrical brain signal\.arXiv preprint arXiv:2602\.11558\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§1](https://arxiv.org/html/2606.06647#S1.p2.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px2.p1.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px6.p1.1),[§5\.6](https://arxiv.org/html/2606.06647#S5.SS6.p1.1)\.
- Y\. Song, Q\. Zheng, B\. Liu, and X\. Gao \(2023\)EEG Conformer: convolutional transformer for EEG decoding and visualization\.IEEE Trans\. Neural Syst\. Rehabil\. Eng\.\.Cited by:[§3\.6](https://arxiv.org/html/2606.06647#S3.SS6.SSS0.Px2.p1.1)\.
- L\. Tang, Q\. Chen, J\. Mei, H\. Xu, Q\. Zhang, J\. Shao, N\. Zou, X\. Hu, and D\. Liu \(2026\)What do EEG foundation models capture from human brain signals?\.arXiv preprint arXiv:2605\.11410\.Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px2.p1.1)\.
- A\. Tomihari and I\. Sato \(2024\)Understanding linear probing then fine\-tuning language models from NTK perspective\.InAdvances in Neural Information Processing Systems,External Links:2405\.16747Cited by:[§5\.2](https://arxiv.org/html/2606.06647#S5.SS2.p1.10)\.
- N\. van der Vinne, M\. A\. Vollebregt, M\. J\. A\. M\. van Putten, and M\. Arns \(2017\)Frontal alpha asymmetry as a diagnostic marker in depression: fact or fiction? a meta\-analysis\.NeuroImage: Clinical16,pp\. 79–87\.External Links:[Document](https://dx.doi.org/10.1016/j.nicl.2017.07.006)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1),[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5),[Table 1](https://arxiv.org/html/2606.06647#S3.T1.5.9.6.3)\.
- H\. van Dijk, G\. van Wingen, D\. Denys, S\. Olbrich, R\. van Ruth, and M\. Arns \(2022\)The two decades brainclinics research archive for insights in neurophysiology \(TDBRAIN\) database\.Scientific Data9\(1\),pp\. 333\.External Links:[Document](https://dx.doi.org/10.1038/s41597-022-01409-z)Cited by:[Table A5](https://arxiv.org/html/2606.06647#A5.T5.25.19.4)\.
- J\. Wanget al\.\(2025\)CBraMod: a criss\-cross brain foundation model for EEG decoding\.ICLR\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px1.p1.7),[§3\.3](https://arxiv.org/html/2606.06647#S3.SS3.p1.3)\.
- S\. Wang, S\. Zhang, W\. Chen, D\. Truong, and T\. Jung \(2025\)From theory to application: fine\-tuning large EEG model with real\-world stress data\.arXiv preprint arXiv:2505\.23042\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§4\.1](https://arxiv.org/html/2606.06647#S4.SS1.p2.5),[§5\.5](https://arxiv.org/html/2606.06647#S5.SS5.p1.12)\.
- Y\. Wang, W\. Duan, D\. Dong, L\. Ding, and X\. Lei \(2022\)A test\-retest resting, and cognitive state EEG dataset during multiple subject\-driven states\.Scientific Data9\(1\),pp\. 566\.External Links:[Document](https://dx.doi.org/10.1038/s41597-022-01607-9)Cited by:[Table A5](https://arxiv.org/html/2606.06647#A5.T5.13.7.4)\.
- J\. Wu, Z\. Ren, J\. Wang, P\. Zhu, Y\. Song, M\. Liu, Q\. Zheng, L\. Bai, W\. Ouyang, and C\. Song \(2025\)AdaBrain\-Bench: benchmarking brain foundation models for brain\-computer interface applications\.arXiv preprint arXiv:2507\.09882\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§1](https://arxiv.org/html/2606.06647#S1.p2.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px1.p1.7),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px2.p1.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px6.p1.1)\.
- C\. Xiang, X\. Fan, D\. Bai, K\. Lv, and X\. Lei \(2024\)A resting\-state EEG dataset for sleep deprivation\.Scientific Data11,pp\. 427\.External Links:[Document](https://dx.doi.org/10.1038/s41597-024-03268-2)Cited by:[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px3.p1.1),[§3\.1](https://arxiv.org/html/2606.06647#S3.SS1.p1.5),[§3\.2](https://arxiv.org/html/2606.06647#S3.SS2.p2.6),[Table 1](https://arxiv.org/html/2606.06647#S3.T1.5.6.3.3),[Data availability](https://arxiv.org/html/2606.06647#Sx1.p1.1)\.
- W\. Xiong, J\. Li, J\. Li, K\. Zhu, and C\. Jiang \(2025\)EEG\-FM\-Bench: a comprehensive benchmark for the systematic evaluation of EEG foundation models\.arXiv preprint arXiv:2508\.17742\.Cited by:[§1](https://arxiv.org/html/2606.06647#S1.p1.5),[§1](https://arxiv.org/html/2606.06647#S1.p2.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px1.p1.7),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px2.p1.1),[§2](https://arxiv.org/html/2606.06647#S2.SS0.SSS0.Px6.p1.1)\.
- I\. Zyma, S\. Tukaev, I\. Seleznov, K\. Kiyono, A\. Popov, M\. Chernykh, and O\. Shpenkov \(2019\)Electroencephalograms during mental arithmetic task performance\.Data4\(1\),pp\. 14\.Note:PhysioNet EEGMAT datasetCited by:[Table A5](https://arxiv.org/html/2606.06647#A5.T5.10.4.4),[§3\.2](https://arxiv.org/html/2606.06647#S3.SS2.p2.6),[Data availability](https://arxiv.org/html/2606.06647#Sx1.p1.1)\.
The Identity Trap in EEG Foundation Models: A Diagnostic Audit

Similar Articles

Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models

Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis

Submit Feedback

Similar Articles

Aperiodic and Low-Frequency Spectral Bias in Reconstruction based EEG Foundation Models
Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection
Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation
Spectral Priors vs. Attention: Investigating the Utility of Attention Mechanisms in EEG-Based Diagnosis