Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction

arXiv cs.CL 04/21/26, 04:00 AM Papers

Summary

Researchers propose Brain-CLIPLM, a two-stage EEG-to-text decoding framework using contrastive learning for semantic anchor extraction and a retrieval-grounded LLM with Chain-of-Thought reasoning, achieving 67.55% top-5 sentence retrieval accuracy and suggesting EEG-to-text decoding should focus on recovering compressed semantic content rather than full sentence reconstruction.

arXiv:2604.16370v1 Announce Type: new Abstract: Decoding natural language from non-invasive electroencephalography (EEG) remains fundamentally limited by low signal-to-noise ratio and restricted information bandwidth. This raises a fundamental question regarding whether sentence-level linguistic structure can be reliably recovered from such signals. In this work, we suggest that this assumption may not hold under realistic information constraints, and instead propose a semantic compression hypothesis in which EEG signals encode a compressed set of semantic anchors rather than full linguistic structure. Under our new perspective, direct sentence reconstruction becomes an overparameterized objective relative to the intrinsic information capacity of EEG. To address this mismatch, we introduce Brain-CLIPLM, a two-stage framework that decomposes EEG-to-text decoding into semantic anchor extraction via contrastive learning and sentence reconstruction using a retrieval-grounded large language model (LLM) with Chain-of-Thought (CoT) reasoning, following a granularity matching principle that aligns decoding complexity with neural information capacity. Evaluated on the Zurich Cognitive Language Processing Corpus, Brain-CLIPLM achieves 67.55\% top-5 and 85.00\% top-25 sentence retrieval accuracy, significantly outperforming direct decoding baseline, while cross-subject evaluation confirms robust generalization. Control analyses, including permutation testing, further demonstrate that EEG-derived representations carry sentence-specific information beyond language model priors. These results suggest that EEG-to-text decoding is better framed as recovering compressed semantic content rather than reconstructing full sentences, providing a biologically grounded and data-efficient pathway for non-invasive brain-computer interfaces.

Original Article

Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction

Similar Articles

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Beyond Parallel Tracking: Interactive Multi-Feature Fusion Drives Semantic Reconstruction from Non-invasive Brain Recordings

Encoding EEG Signals to Examine Human-Like Next-Word Prediction Behaviour in Language Models

Interpreting Brain Responses to Language with Sparse Features from Language Models

The Capacity of Thought: Benchmarking Llama 3.2 in Semantic fMRI Neural Language Decoding and Improving the Huth Encoding-Model Baseline

Submit Feedback

Similar Articles

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Beyond Parallel Tracking: Interactive Multi-Feature Fusion Drives Semantic Reconstruction from Non-invasive Brain Recordings

Encoding EEG Signals to Examine Human-Like Next-Word Prediction Behaviour in Language Models

Interpreting Brain Responses to Language with Sparse Features from Language Models

The Capacity of Thought: Benchmarking Llama 3.2 in Semantic fMRI Neural Language Decoding and Improving the Huth Encoding-Model Baseline