Modeling semantic association in self-paced reading with language model embeddings

arXiv cs.CL Papers

Summary

This study uses language model embeddings to quantify semantic association in self-paced reading and EEG data, examining how different implementations affect measures of reading difficulty.

arXiv:2606.07066v1 Announce Type: new Abstract: Semantic association between a word and its context has been identified as an important component of reading comprehension, even when word predictability is accounted for. Recent research has highlighted the potential of language model ( LM) embeddings to quantify semantic association. Yet, embedding-based semantic association have been operationalized in a myriad of ways. In this study, we use embeddings from LMs to estimate semantic association on a corpus of joint electroencephalography (EEG) and self-paced reading of natural, Dutch texts. Semantic association is calculated in ten different implementations that vary the embedding model and context lengths. The effects of semantic association across the different implementations on the N400 and self-paced reading times are examined using Bayesian hierarchical models and Bayes factor. The results show that the choice of embedding model can alter the estimated effect of semantic association on both the N400 and self-paced reading times. Furthermore, the results demonstrate a promising potential of sentence embeddings for capturing semantic association, as only implementations relying on sentence embeddings indicate reliable results of semantic association beyond word predictability on both neural and behavioral measures. Together, these findings highlight the importance of methodological choices in quantifying semantic association.
Original Article
View Cached Full Text

Cached at: 06/08/26, 09:21 AM

# Modeling semantic association in self-paced reading with language model embeddings
Source: [https://arxiv.org/html/2606.07066](https://arxiv.org/html/2606.07066)
###### Abstract

Semantic association between a word and its context has been identified as an important component of reading comprehension, even when word predictability is accounted for\. Recent research has highlighted the potential oflanguage model\(LM\) embeddings to quantify semantic association\. Yet, embedding\-based semantic association have been operationalized in a myriad of ways\. In this study, we use embeddings fromLMsto estimate semantic association on a corpus of jointelectroencephalography\(EEG\) and self\-paced reading of natural, Dutch texts\. Semantic association is calculated in ten different implementations that vary the embedding model and context lengths\. The effects of semantic association across the different implementations on the N400 and self\-paced reading times are examined using Bayesian hierarchical models and Bayes factor\. The results show that the choice of embedding model can alter the estimated effect of semantic association on both the N400 and self\-paced reading times\. Furthermore, the results demonstrate a promising potential of sentence embeddings for capturing semantic association, as only implementations relying on sentence embeddings indicate reliable results of semantic association beyond word predictability on both neural and behavioral measures\. Together, these findings highlight the importance of methodological choices in quantifying semantic association\.

Keywords:semantic association,self\-paced reading\(SPR\),electroencephalography\(EEG\), N400, sentence processing

\\NAT@set@cites

Modeling semantic association in self\-paced reading with language model embeddings

Sara Møller Østergaard∗, Kenneth Enevoldsen†, Afra Alishahi∗,and Bruno Nicenboim∗∗Department of Computational Cognitive Science, Tilburg University†Center for Humanities Computing, Aarhus Universitys\.m\.ostergaard@tilburguniversity\.eduAbstract content

## 1\. Introduction

Humans process words in the context in which they are presented\. How predictable a word is given its preceding context largely impacts the processing difficulty of the word\(Kutas and Federmeier,[2011](https://arxiv.org/html/2606.07066#bib.bib11); Ehrlich and Rayner,[1981](https://arxiv.org/html/2606.07066#bib.bib21); Wonget al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib29)\)\. For example, in the sentence pair, “By the end of the day, the hiker’s feet were extremely cold and wet\. It was the last time he would ever buy a cheap pair ofboot/jeans\.”, the final word “boots” is highly predictable based on the preceding context and is therefore processed more easily than the alternative ending “jeans”, which is comparatively unpredictable in this context \(example fromFedermeier and Kutas,[1999](https://arxiv.org/html/2606.07066#bib.bib1)\)\.

Predictability of a word, or its probability given a context, has been estimated using a range of probabilistic models, including probabilistic grammars\(Hale,[2001](https://arxiv.org/html/2606.07066#bib.bib12)\)and, more recently, next token probabilities fromLMs\(Michaelov and Bergen,[2024](https://arxiv.org/html/2606.07066#bib.bib4); Michaelovet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib5); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Franket al\.,[2015](https://arxiv.org/html/2606.07066#bib.bib18); Frank and Aumeistere,[2024](https://arxiv.org/html/2606.07066#bib.bib19); Pimentelet al\.,[2023](https://arxiv.org/html/2606.07066#bib.bib20)\)\. Additionally, word predictability has been estimated using the cloze task111The cloze task is a language comprehension task in which one or more words are removed from a text and must be filled in by the participant based on contextual cues\.\(Luke and Christianson,[2018](https://arxiv.org/html/2606.07066#bib.bib49); Dambacheret al\.,[2006](https://arxiv.org/html/2606.07066#bib.bib50); Bulkeset al\.,[2020](https://arxiv.org/html/2606.07066#bib.bib48)\)\.

Word predictability has been able to explain important aspects of processing difficulty, however, it doesn’t provide a full account\. In addition to predictability, semantic association presents another factor that modulates reading comprehension\(Kutas and Federmeier,[2011](https://arxiv.org/html/2606.07066#bib.bib11); Brouweret al\.,[2012](https://arxiv.org/html/2606.07066#bib.bib47)\)\. Semantic association refers to the degree of semantic relatedness between a target word and the context in which it is presented\. While this measure is related to the predictability of the word, it has distinct properties\. Using the example context from above, “By the end of the day, the hiker’s feet were extremely cold and wet\. It was the last time he would ever buy a cheap pair ofsandals\.”, the word “sandals” is unpredictable in the context, however, it is semantically associated with the context \(which mentions feet\)\.Federmeier and Kutas \([1999](https://arxiv.org/html/2606.07066#bib.bib1)\)show that this distinction results in different processing of these target words\.

Semantic illusion has been used to study the effects of semantic association beyond word predictability\. Semantic illusions refer to a phenomenon where unpredictable \(or incorrect\) words are temporally unnoticed because the words are semantically associated with the context\. The sentence “For breakfast theeggswould only eat toast and jam\.”, illustrates this effect, where the word “eat” fails to elicit the expected neural response to an unpredictable word\(Kuperberget al\.,[2003](https://arxiv.org/html/2606.07066#bib.bib16)\)\. Studies on semantic illusion report that words semantically associated with their context are processed differently \(as shown withelectroencephalography;EEG\) compared to words that lack such associations\(Kuperberget al\.,[2003](https://arxiv.org/html/2606.07066#bib.bib16); Nieuwland and Van Berkum,[2005](https://arxiv.org/html/2606.07066#bib.bib45); Stone and Rabovsky,[2025](https://arxiv.org/html/2606.07066#bib.bib23); Aurnhammeret al\.,[2023](https://arxiv.org/html/2606.07066#bib.bib46)\)\. Relatedly,Kriegeret al\.\([2024](https://arxiv.org/html/2606.07066#bib.bib15)\)found that word predictability fromLMsdoesn’t capture the complete role of contextual information in human sentence processing, particularly with respect to semantic association\.

Processing difficulty is commonly indexed using behavioral measures such as reading times, as well as neural measures derived fromEEG, including the N400 and the P600event\-related potential\(ERP\) components\. Word predictability has been shown to have robust effects on reading times and the N400\(Kutas and Federmeier,[2011](https://arxiv.org/html/2606.07066#bib.bib11); Ehrlich and Rayner,[1981](https://arxiv.org/html/2606.07066#bib.bib21); Franket al\.,[2015](https://arxiv.org/html/2606.07066#bib.bib18); Shain,[2024](https://arxiv.org/html/2606.07066#bib.bib22); Pimentelet al\.,[2023](https://arxiv.org/html/2606.07066#bib.bib20); Frank and Aumeistere,[2024](https://arxiv.org/html/2606.07066#bib.bib19); Federmeier and Kutas,[1999](https://arxiv.org/html/2606.07066#bib.bib1)\)\. In contrast, semantic association between target words and their context has been investigated primarily inERPstudies, with fewer studies examining its relationship to reading times\.

\\Ac

erp studies of semantic association have mostly focused on the N400 component, where semantic association decreases the negative amplitude of the component\(Fischleret al\.,[1983](https://arxiv.org/html/2606.07066#bib.bib14); Kuperberget al\.,[2003](https://arxiv.org/html/2606.07066#bib.bib16); Federmeier and Kutas,[1999](https://arxiv.org/html/2606.07066#bib.bib1); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17)\)\. However, studies have found that the effect of semantic association on the N400 disappears when there is a delay between the semantically related context and the critical word\(Chowet al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib24); Stone and Rabovsky,[2025](https://arxiv.org/html/2606.07066#bib.bib23)\)\. Furthermore,Salicchi and Hsu \([2025](https://arxiv.org/html/2606.07066#bib.bib43)\)found that semantic association didn’t explain variance in the N400 component when surprisal was accounted for, while it did in the P600 component, suggesting effects on later processing stages\. Evidence of the effect on reading times is less explored\. While some studies have found that stronger semantic association decreases reading times\(Pynteet al\.,[2008](https://arxiv.org/html/2606.07066#bib.bib25); Mitchellet al\.,[2010](https://arxiv.org/html/2606.07066#bib.bib26)\), other studies indicate that semantic association has no effect on reading times when excluding the variance explained by word predictability\(Traxleret al\.,[2000](https://arxiv.org/html/2606.07066#bib.bib27); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9)\)\.

Studies of semantic association have mostly relied on stimuli consisting of handcrafted contexts and target words, where they are either semantically similar or not\(Federmeier and Kutas,[1999](https://arxiv.org/html/2606.07066#bib.bib1); Fischleret al\.,[1983](https://arxiv.org/html/2606.07066#bib.bib14); Kuperberget al\.,[2003](https://arxiv.org/html/2606.07066#bib.bib16); Stone and Rabovsky,[2025](https://arxiv.org/html/2606.07066#bib.bib23)\)\. However, recent studies have attempted to estimate the semantic association using embeddings fromLMs\(Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Ettingeret al\.,[2016](https://arxiv.org/html/2606.07066#bib.bib10); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Michaelovet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib5); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Michaelov and Bergen,[2024](https://arxiv.org/html/2606.07066#bib.bib4); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17); Parvizet al\.,[2011](https://arxiv.org/html/2606.07066#bib.bib36)\)\. Thereby, enabling the quantification of semantic association as a continuous measure and facilitating analyses that can extend to naturalistic stimuli\.

Embedding\-based estimates of semantic association have been conceptualized in a myriad of ways\. Firstly, studies deploy different embedding models for extracting the embeddings of the context and the critical word\. Most studies use word embeddings, e\.g\., GloVe, word2vec or fastText\(Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Ettingeret al\.,[2016](https://arxiv.org/html/2606.07066#bib.bib10); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Michaelovet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib5); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Michaelov and Bergen,[2024](https://arxiv.org/html/2606.07066#bib.bib4); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17)\), however, these models vary in model architecture, embedding size, and training data\. Secondly, the context embedding is defined in a variety of ways\. Most commonly an average of the word embeddings are used, however, which words are included in the average varies: some studies use all the words in the context\(Michaelov and Bergen,[2024](https://arxiv.org/html/2606.07066#bib.bib4); Michaelovet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib5); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8)\), others only content words\(Mechtenberget al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib3); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17)\)or a manually select subset of the words\(Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Ettingeret al\.,[2016](https://arxiv.org/html/2606.07066#bib.bib10)\)\. Additionally, the length of the context varies\. While most studies rely on sentence\-level stimuli and use all the preceding words as the context, other studies relying on stimuli consisting of longer text have defined context windows\.Frank \([2017](https://arxiv.org/html/2606.07066#bib.bib9)\)defined the context in two separate ways: i\) only the sentence preceding the critical word and ii\) the four content words immediately preceding the critical word\. Similarly,Mechtenberget al\.\([2025](https://arxiv.org/html/2606.07066#bib.bib3)\)examined local and global effects of semantic association by defining context windows of one, two, five, and ten words preceding the critical word, excluding stop words\. Finally, different functions for calculating the similarity between the embeddings of the critical word and the context have been employed: While the vast majority utilize the cosine similarity\(Ettingeret al\.,[2016](https://arxiv.org/html/2606.07066#bib.bib10); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Michaelovet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib5); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Michaelov and Bergen,[2024](https://arxiv.org/html/2606.07066#bib.bib4)\), Pearson’s correlation has also been used \(e\.g\.,Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8)\)

The present study investigated whether semantic association derived fromLMembeddings captured aspects of language processing not accounted for by word predictability alone\. To accommodate alternative formalizations of semantic association, we defined multiple implementations, varying the embedding model and the size of the contextual window used to compute semantic association\. We evaluated these implementations using Bayesian model comparison \(Bayes factor\) and assess their effects on self\-paced reading times and the N400ERPcomponent\. The results of the study showed how the choice of embedding model and the conceptualization of the context can alter the conclusions across neural and behavioral signals\.

## 2\. Methods

### 2\.1\. Data

The study used data fromthe Tilburg corpus of Natural Dutch Texts\(TiNT;Østergaardet al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib51)\)\. The corpus consists of joined recordings ofEEGandSPRfrom 71 participants \(whereof 56 participants were included in the analysis of the current study\)\. All participants read eight medium\-length \(approx\. 600 words\), natural, Dutch texts of different genres\. Seven texts were read using aSPRparadigm, while a single text was read in arapid serial visual presentation\(RSVP\) paradigm \(the exact text changing from participant to participant\)\. In this study, we only used data recorded duringSPR\.

Preprocessing of theEEGsignal and extraction ofERPswere identical to that ofØstergaardet al\.\([2025](https://arxiv.org/html/2606.07066#bib.bib51)\)\. Preprocessing included rereferencing of the electrodes, band\-pass filtering, and artifact detection and exclusion\. The N400 was defined as the mean amplitude of centroparietal electrodes in the time window 300\-500ms after word onset\.

### 2\.2\. Semantic association

Semantic association was defined as the similarity between the embedding of the context and the embedding of the critical word\. Thus, three methodological decisions were required: \(1\) How to represent the embeddings of the context and the critical word, \(2\) what context length to use, and \(3\) which similarity function to apply\. In this paper, we defined multiple implementations of semantic association by varying the first two factors, while we used the cosine similarity as the similarity metric across all implementations\. Cosine similarity was used, as it is the standard similarity measure for distributional embedding models\(Yamadaet al\.,[2020](https://arxiv.org/html/2606.07066#bib.bib31); Reimers and Gurevych,[2019](https://arxiv.org/html/2606.07066#bib.bib7)\)\.

\(1\) Embeddings of the context and the word:Multiple approaches exist for deriving embeddings of text usingLMs\. Embeddings can be uncontextualized, such as, GloVe, word2vec, or fastText\(Penningtonet al\.,[2014](https://arxiv.org/html/2606.07066#bib.bib59); Mikolovet al\.,[2013](https://arxiv.org/html/2606.07066#bib.bib60); Bojanowskiet al\.,[2017](https://arxiv.org/html/2606.07066#bib.bib61)\)\. Such models produce a single embedding for each word in isolation\. Alternatively, embeddings can be contextualized\. Contextualized embeddings can be derived from transformer models, including both encoders such as BERTDevlinet al\.\([2019](https://arxiv.org/html/2606.07066#bib.bib54)\)and generative models such as GPT and LLamaRadfordet al\.\([2018](https://arxiv.org/html/2606.07066#bib.bib56)\); Touvronet al\.\([2023](https://arxiv.org/html/2606.07066#bib.bib57)\)by retrieving embeddings from the last hidden state\. However, embeddings derived directly from pre\-trained models typically perform poorly, and thus it has become the norm to adapt contextualized transformer models for embedding tasks, such as semantic text similarity\(Reimers and Gurevych,[2019](https://arxiv.org/html/2606.07066#bib.bib7); Gaoet al\.,[2021](https://arxiv.org/html/2606.07066#bib.bib53); Liet al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib55)\)\.

An initial exploration of implementations of semantic association using different embedding models was conducted with simple sentences where the differences in semantic association were handcrafted\. The results of the exploration indicated that both contextualized and uncontextualized embedding models were able to differentiate words semantically associated with the context from unrelated words\. The results from the models were similar within embedding type \(i\.e\., contextualized or uncontextualized\)\.222Results of initial exploration can be found in appendices\.As such, we selected two candidate models: an uncontextualized word embedding model and a contextualized sentence embedding model\.333For historical reasons, we call these sentence embeddings, as they initially were trained to embed sentences\. However, they have since been expanded to embed entire documents\.

For the uncontextualized model, we used the word2vec modelwikipedia2vec\_nlwiki\_20180420\_300d444Model revisions can be found in appendices\.\(Yamadaet al\.,[2020](https://arxiv.org/html/2606.07066#bib.bib31)\)\. This model was chosen as the training procedure matched previous literature\(Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Ettingeret al\.,[2016](https://arxiv.org/html/2606.07066#bib.bib10); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17)\)and its training data overlaps with theTiNTcorpus\(Østergaardet al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib51)\)\. As the model only produces one embedding for each word, we used two methods for obtaining the embedding of the context: i\) the average of the embeddings of all the words in the context \(denotedWE\), and ii\) the average of all the content words in the context \(denotedCWE\)\. We used thenl\_core\_news\_smmodel fromspaCyto extract thepart of speech\(POS\) tags\. Content words were identified as words with thePOStags noun, verb, adjective, or adverb\.

For the sentence embedding we usede5\-large\-trm\-nl\(Banaret al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib32)\)as it performed well on the Dutch embedding benchmark \(MTEB\(nld, v1\);Banaret al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib32); Enevoldsenet al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib58)\)\. Sentence embeddings are trained to produce an aggregated embedding over multiple words, thus, it didn’t require post\-hoc averaging to obtain the embedding of the context\. Implementations with sentence embeddings are denotedSE\.

\(2\) Context length:Contexts of varying length were defined to examine local and global effects of semantic association\. Four distinct context lengths were used\. First, a naive context consisting of all words preceding the critical word was used \(All\)\. Second, we defined a context consisting of all words in the preceding sentence, as well as in the sentence to which the critical word belonged \(Sentence\(N=1\)\)\. Finally, we defined a windowed context, where the context consisted of a fixed number of content words before the target\. Here, we used windows of one and two \(Windowed\(N=1\)andWindowed\(N=2\)\)\. The windowed implementation was only defined with \(content\) word embeddings\.

In addition to the contexts of different lengths, we defined a weighted average of the word embeddings\. The weights followed an exponential forgetting curve, thus, assuming words appearing closer to the critical word were more important\. This was implemented as in Equation[1](https://arxiv.org/html/2606.07066#S2.E1):

WE, Weighted=∑i=1N2−i4⋅similarity​\(wc,wi\)\\text\{WE, Weighted\}=\\sum\_\{i=1\}^\{N\}2^\{\\frac\{\-i\}\{4\}\}\\cdot\\text\{similarity\}\(w\_\{c\},w\_\{i\}\)\(1\)Here,wcw\_\{c\}is the word embedding of the critical word andwiw\_\{i\}the word embedding of the wordiiwords away from the critical word\. The equation sums over all words preceding the critical word\. The denominator \(4\) determines the half\-life of the decay and was chosen such that words at a distance of ten or more from the critical word receive minimal weights\. As for the other implementations, the similarity was calculated with the cosine similarity\. The weighted average was implemented with word embeddings of all words and word embeddings of content words only\. All the implementations of semantic association are summarized in Table[1](https://arxiv.org/html/2606.07066#S2.T1)\.

Table 1:All implementations of semantic association used in the current study\. The implementations will be referred to by the name in the name column\.Correlations between semantic association for content words in the corpus extracted using the different implementations are shown in Figure[1](https://arxiv.org/html/2606.07066#S2.F1)\. Implementations based on the same type of embedding \(i\.e\., word or sentence embeddings\) are strongly correlated, suggesting that they index similar sources of variance\. In contrast, correlations across embedding types are substantially weaker\.

![Refer to caption](https://arxiv.org/html/2606.07066v1/figs/correlations_CW.png)Figure 1:Pearson’s correlation coefficients between implementations of semantic association, log\-probability of words, and Zipf word frequency for all content words in the corpus\.
### 2\.3\. Regression models and model comparison

Bayesian hierarchical models were fitted to examine the effect of the different implementations of semantic association on the two dependent variables: self\-paced reading times and the N400\. The models were fitted in Stan \(version 2\.32\.2;Stan Development Team,[2023](https://arxiv.org/html/2606.07066#bib.bib40)\) using thebrmspackage \(version 2\.22\.0;Bürkner,[2017](https://arxiv.org/html/2606.07066#bib.bib34)\) in R\(R Core Team,[2024](https://arxiv.org/html/2606.07066#bib.bib35)\)\. All predictors were z\-score standardized\. Words with reading times lower than 100 ms or greater than 3000 ms were excluded from analysis\. Only content words \(i\.e\., nouns, verbs, adjectives, and adverbs\) were included in the analysis\. As embeddings extracted from the word2vec model only exist for a finite number of words, the data loss slightly differed across different implementations of semantic association\.555Data loss across all implementations is reported in appendices\.The models were fitted on complete cases across all implementations\.

The models were run with two predictors, log\-probabilityl​plp, estimated by the average word probability from four GPT models \(seeØstergaardet al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib51)\), and semantic associations​e​msem\. One regression model for each of the two dependent variables and each of the 10 implementations of semantic association was fitted, resulting in 20 separate models\. Uncorrelated group\-level intercepts and slopes for bothl​plpands​e​msemwere estimated for each participant, document, and word\. For the reading time \(RT\) model, a log\-normal likelihood was used, while for the N400 model, a Gaussian likelihood was used \(See Equations[2](https://arxiv.org/html/2606.07066#S2.E2),[3](https://arxiv.org/html/2606.07066#S2.E3), and[4](https://arxiv.org/html/2606.07066#S2.E4)\)\.

RT∼L​o​g​N​o​r​m​a​l​\(μ,σ\)\\displaystyle\\sim LogNormal\(\\mu,\\sigma\)\(2\)N400∼N​o​r​m​a​l​\(μ,σ\)\\displaystyle\\sim Normal\(\\mu,\\sigma\)\(3\)μ=\\displaystyle\\mu=α\+up​a​r​t​i​c​i​p​a​n​t,0\+ud​o​c​u​m​e​n​t,0\+\\displaystyle\\alpha\+u\_\{participant,0\}\+u\_\{document,0\}\+uw​o​r​d,0\+\(β1\+up​a​r​t​i​c​i​p​a​n​t,1\+\\displaystyle u\_\{word,0\}\+\(\\beta\_\{1\}\+u\_\{participant,1\}\+\(4\)ud​o​c​u​m​e​n​t,1\+uw​o​r​d,1\)⋅lp\+\(β2\+\\displaystyle u\_\{document,1\}\+u\_\{word,1\}\)\\cdot lp\+\(\\beta\_\{2\}\+up​a​r​t​i​c​i​p​a​n​t,2\+ud​o​c​u​m​e​n​t,2\+uw​o​r​d,2\)⋅sem\\displaystyle u\_\{participant,2\}\+u\_\{document,2\}\+u\_\{word,2\}\)\\cdot sem
Different priors were used for the reading times and the N400 model, as the scales of the dependent variables were different, i\.e\., reading times in ms andERPcomponents inμ\\upmuV\. For all models, regularizing priors were used to ensure stable and plausible estimates\(Nicenboimet al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib28)\)\. The priors for the reading times model were as follows:

α\\displaystyle\\alpha∼N​o​r​m​a​l​\(5\.5,1\)\\displaystyle\\sim Normal\(5\.5,1\)β\\displaystyle\\beta∼N​o​r​m​a​l​\(0,\.1\)\\displaystyle\\sim Normal\(0,\.1\)u\\displaystyle u∼N​o​r​m​a​l​\(0,s​d\)\\displaystyle\\sim Normal\(0,sd\)s​d\\displaystyle sd∼N​o​r​m​a​l\+​\(0,\.5\)\\displaystyle\\sim Normal\_\{\+\}\(0,\.5\)σ\\displaystyle\\sigma∼N​o​r​m​a​l\+​\(0,\.5\)\\displaystyle\\sim Normal\_\{\+\}\(0,\.5\)The priors for the models of theERPcomponents were:

α\\displaystyle\\alpha∼N​o​r​m​a​l​\(0,20\)\\displaystyle\\sim Normal\(0,20\)β\\displaystyle\\beta∼N​o​r​m​a​l​\(0,10\)\\displaystyle\\sim Normal\(0,10\)u\\displaystyle u∼N​o​r​m​a​l​\(0,s​d\)\\displaystyle\\sim Normal\(0,sd\)s​d\\displaystyle sd∼N​o​r​m​a​l\+​\(0,10\)\\displaystyle\\sim Normal\_\{\+\}\(0,10\)σ\\displaystyle\\sigma∼N​o​r​m​a​l\+​\(0,10\)\\displaystyle\\sim Normal\_\{\+\}\(0,10\)
To assess the influence of the various implementations of semantic association on reading comprehension \(i\.e\., reading times and the N400\), we used Bayes factors\. Bayes factor provides a framework for Bayesian hypothesis testing by quantifying evidence in favor of a model \(M0M\_\{0\}\) given another \(M1M\_\{1\}\)\. This is calculated as the ratio between the marginal likelihoods of the two models, which in turn responds to two hypotheses\. \(see Equation[5](https://arxiv.org/html/2606.07066#S2.E5)\)\.

B​F01=p​\(y\|M0\)p​\(y\|M1\)BF\_\{01\}=\\frac\{p\(y\|M\_\{0\}\)\}\{p\(y\|M\_\{1\}\)\}\(5\)
As such, a Bayes factor of one indicates no evidence for either model, a Bayes factor of 10 would be strong evidence forM0M\_\{0\}, and a Bayes factor of1/101/10indicates strong evidence forM1M\_\{1\}\. We used the Savage\-Dickey density ratio method to calculate Bayes factor, as it provides a convenient method for computing Bayes factor for nested models with a point null hypothesis\(Dickey and Lientz,[1970](https://arxiv.org/html/2606.07066#bib.bib41)\)\. We specifically tested the null hypothesis that there’s no effect of semantic association in the models when log\-probability is included\. The Savage\-Dickey ratio was calculated separately for each model as in Equation[6](https://arxiv.org/html/2606.07066#S2.E6)\.

B​F01=p​\(β2=0\|y\)p​\(β2=0\)BF\_\{01\}=\\frac\{p\(\\beta\_\{2\}=0\|y\)\}\{p\(\\beta\_\{2\}=0\)\}\(6\)
Here,yydenotes the observed data andβ2\\beta\_\{2\}is the coefficient for semantic association\.

As Bayes factor is sensitive to the choice of prior, we conducted a sensitivity analysis by varying the width of the prior forβ2\\beta\_\{2\}while keeping the priors for all other parameters fixed\. For the reading times models, we used additional priors ofβ2∼N​o​r​m​a​l​\(0,\.05\)\\beta\_\{2\}\\sim Normal\(0,\.05\)andβ2∼N​o​r​m​a​l​\(0,\.5\)\\beta\_\{2\}\\sim Normal\(0,\.5\), and for the N400 models,β2∼N​o​r​m​a​l​\(0,1\)\\beta\_\{2\}\\sim Normal\(0,1\)andβ2∼N​o​r​m​a​l​\(0,2\)\\beta\_\{2\}\\sim Normal\(0,2\)\. For a more elaborate explanation of Bayes factor and the Savage\-Dickey ratio, seeNicenboim and Vasishth \([2016](https://arxiv.org/html/2606.07066#bib.bib42)\)\.

Most models were fitted using four chains with 2,000 iterations, where half the iterations were warm\-up samples\. However, six models required 3,000 iterations to ensure stable posterior sampling\. The models reported in this paper had no divergent transitions,R^​s≤1\.03\\hat\{R\}s\\leq 1\.03, and the number of bulk and tail effective samples was at least 119 and an average of 1,477\.

## 3\. Results

![Refer to caption](https://arxiv.org/html/2606.07066v1/x1.png)Figure 2:Regression coefficients and 95% credible intervals for semantic associationβ2\\beta\_\{2\}as estimated by the different implementations\. Every point representsβ2\\beta\_\{2\}from a separate regression model\. The models with N400 as the dependent variable are measured inμ\\upmuV, while the reading times models are measured in ms \(thus, the scales of the x\-axis differ across the two\)\.![Refer to caption](https://arxiv.org/html/2606.07066v1/x2.png)Figure 3:Bayes factor\.B​F​01\>1BF01\>1indicates more evidence for the null hypothesis \(i\.e\., no effect of semantic association\) andB​F​01<1BF01<1indicates more evidence for the alternative hypothesis \(i\.e\., an effect of semantic association\)\. Each Bayes factor was calculated for separately fitted models with different standard deviations \(SD\) for the prior ofβ2\\beta\_\{2\}\.Figure[2](https://arxiv.org/html/2606.07066#S3.F2)displays the coefficients for semantic association \(β2\\beta\_\{2\}in Equation[4](https://arxiv.org/html/2606.07066#S2.E4)\) estimated by the 20 regression models using the values from the different implementations of semantic association to predict the N400 and self\-paced reading times\. Bayes factor forβ2\\beta\_\{2\}across the regression models reported in Figure[3](https://arxiv.org/html/2606.07066#S3.F3)\. The results from the Bayes factor showed anecdotal evidence \(B​F​01∈\{1,1/3\}BF01\\in\\\{1,1/3\\\}\) for an effect of semantic association in only two models \(SE, Sentence\(N=1\)for the N400 andSE, Allfor reading times\)\. Across the rest of the models for both dependent variables, there was the most evidence for the null hypothesis, i\.e\., no effect of semantic association\.

Embedding models:The results of the regression models indicate that the choice of embedding model when calculating semantic association impacts the estimated effects on neural and behavioral measures\. This pattern was particularly pronounced for the models of the N400\. The estimated effect of semantic association on the N400 when using sentence embeddings \(SE\) was positive, meaning that less semantically associated words elicited a more negative N400 amplitude, consistent with previous literature\(Fischleret al\.,[1983](https://arxiv.org/html/2606.07066#bib.bib14); Kuperberget al\.,[2003](https://arxiv.org/html/2606.07066#bib.bib16); Federmeier and Kutas,[1999](https://arxiv.org/html/2606.07066#bib.bib1); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17)\)\. In contrast, when semantic association was calculated using word embeddings \(WE\), the direction of the effect reversed, i\.e\., a negative estimate\. When semantic association was calculated using the same word embeddings but only embeddings of the content words in the context \(CWE\), the estimated effect of semantic association was close to zero\. For reading times, only the model of semantic association from theSE, Allimplementation indicated an effect\. This model estimated a positive effect of semantic association on reading times; thus, reading times increased when words were more semantically associated to the context\. The estimated coefficients for semantic association for the rest of the models were smaller and generally close to zero\.

Context length:The results show that the length of the context matters only for the semantic association defined with sentence embeddings\. The implementations of semantic association using word embeddings \(bothWEandCWE\) showed similar effects on both the N400 and reading times across all contexts \(All,Sentence,Weighted, andWindowed\)\. For the implementations relying on sentence embeddings, the effect of context appeared to play a more substantial role\. On the N400, the effect of semantic association was largest for the regression model withSE, Sentence\(N=1\), while the largest effect of semantic association on reading times was estimated by the model withSE, All\.

## 4\. Discussion

![Refer to caption](https://arxiv.org/html/2606.07066v1/figs/example_sentences_annotated.png)Figure 4:Semantic association \(as estimated by different implementations\), log\-probability, and Zipf frequency of words in two sentence pairs from two different documents in the corpus\. Highlighted are the words “draak” \(English: “dragon”\) in sentences A and “Nomadisme” \(English: “Nomadism”\) and “nomadische” \(English: “nomadic”\) in sentences B\. All variables \(i\.e\., log probability, Zipf frequency, and semantic associations\) are z\-score standardized\.In this study, we employed both uncontextualized word embeddings and contextualized sentence embeddings to estimate semantic association\. These two embedding types appear to capture distinct patterns in the text\. This is both apparent from the correlations between the different implementations of semantic association \(Figure[1](https://arxiv.org/html/2606.07066#S2.F1)\) but, more importantly, for the estimated effects of semantic association on self\-paced reading times and the N400 too \(Figure[2](https://arxiv.org/html/2606.07066#S3.F2)and Figure[3](https://arxiv.org/html/2606.07066#S3.F3)\)\. The results of the regression models show that the type of embeddings used influences the estimated effect of semantic association\. This finding is most pronounced for the N400, where the estimated effect reverses direction depending on whether semantic association is computed using sentence or word embeddings\. A positive effect of semantic association on the N400 is found when using sentence embeddings, while the opposite \(i\.e\., a negative effect\) is estimated with word embedding\-based semantic association\. In contrast, previous literature using similar uncontextualized word embeddings to calculate semantic association reports a positive effect of semantic association on the N400\(Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6)\)\. It is important to note that the Bayes factor indicated no evidence for the negative effects estimated for word embedding\-based semantic association, however, anecdotal evidence for one of the models with a positive effect for semantic association from sentence embeddings\.

What could be possible explanations for the observed difference in semantic association when computed with different types of embedding models? One important distinction between sentence embeddings and word embeddings when using them to create context representations lies in how information was aggregated\. Although sentence embeddings output an aggregation of embeddings too, the model has been trained to produce a semantically coherent representation in which more informative words receive greater weight\. In this study, the implementations of semantic association using word embeddings from word2vec relied on a naive approach, where an unweighted average was utilized, inspired by previous approaches\(Michaelov and Bergen,[2024](https://arxiv.org/html/2606.07066#bib.bib4); Michaelovet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib5); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Mechtenberget al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib3); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Ettingeret al\.,[2016](https://arxiv.org/html/2606.07066#bib.bib10)\)\. As such, important information could be lost in the context representations derived from the word embedding implementations — especially for the implementations of longer contexts \(i\.e\.,WE, AllandWE, Sentence\(N=1\)\)\. Studies finding positive effects of word embedding\-based semantic association on the N400 have generally relied on shorter contexts \(either because sentence\-level stimuli were used or because they defined short context lengths\), thus minimizing the information loss when averaging over embeddings\. This speculation is supported by our initial exploration, where both sentence and word embeddings produced effects in the same direction using sentence pairs fromFedermeier and Kutas \([1999](https://arxiv.org/html/2606.07066#bib.bib1)\)\.666Results of initial exploration can be found in appendicesWhile the weighted implementations of semantic association \(WE, WeightedandCWE, Weighted\) were cognitively motivated implementations, discounting the influence of word embeddings on the overall average based an exponential forgetting curve, this approach didn’t seem promising given the results of the current study\. The weights were solely based on distances to the critical word, thus words were not weighted based on their semantic relevance\.

To our knowledge, no previous works have used sentence embeddings for studying semantic association in sentence processing\. The present findings suggest that this approach provides a promising method for estimating semantic relations between contexts and target words\. Implementations based on sentence embeddings showed the most reliable effects on both the N400 and self\-paced reading times, as reflected by the size of the regression coefficients and Bayes factors\. Post\-hoc qualitative analyses further indicated that sentence embeddings are more sensitive to the general themes of the texts compared to averaged word embeddings\. Figure[4](https://arxiv.org/html/2606.07066#S4.F4)illustrates a difference between sentence embeddings and word embeddings for calculating semantic association in two examples of sentence pairs from the corpus\. In the first pair, the word “dragon” appears in both sentences\. Only theSE, Allimplementation captures the association between “draak” \(English: “dragon”\) and the story “Mijn Heer Zak met Rijst”777A fairy tale about a Dragon King\.in the first sentence, whileSE, Sentence \(N=1\)detects the association when the word reappears in the second sentence\. In contrast, none of the word embedding implementations indicate a strong association for “dragon” in either instance\. In sentence pair B from the text “Nomadisch pastoralisme”, a similar pattern is observed with the two words “Nomadisme” \(English: “Nomadism”\) and “nomadische” \(English: “nomadic”\)\. While these examples were selected for illustrative purposes, they suggest that sentence embeddings capture a thematically coherent representation of semantic association in natural texts\.

Naturally, this interpretation depends on the operationalization of semantic association\. In the present study, semantic association was defined as the similarity between embeddings of the context and the critical word, following prior work\(Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Ettingeret al\.,[2016](https://arxiv.org/html/2606.07066#bib.bib10); Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Michaelovet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib5); Frank,[2017](https://arxiv.org/html/2606.07066#bib.bib9); Michaelov and Bergen,[2024](https://arxiv.org/html/2606.07066#bib.bib4); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17)\)\. This operationalization assumes that the embeddings encode multiple aspects of meaning, including both shared features \(e\.g\.,nurseandmechanicas occupations\) and thematic relations \(e\.g\.,nurseandhospitalas related\)\. Consequently, embedding similarity captures similarities in the features of the word as well as their relatedness\. Following this definition, repeated words will inflate semantic association, as the similarity between two identical embeddings is one \(i\.e\., maximum semantic association\)\. However, this property applies for all the implementations of semantic association considered in the current work, thus, the effect of repetition can’t account for the differences observed in example A in Figure[4](https://arxiv.org/html/2606.07066#S4.F4)\.

The results of the current study are exploratory, and further work is required to identify under which conditions specific implementations ofLMembedding\-based estimates of semantic association differ\. The analysis was based exclusively on texts from a single corpus \(the Tilburg corpus of Natural Dutch Texts\(TiNT\);Østergaardet al\.,[2025](https://arxiv.org/html/2606.07066#bib.bib51)\), which consists of medium\-length, Dutch texts\. Not only does this corpus stand in contrast to previously used stimuli by the length of the texts \(as touched upon above\), but also in the language\. As Dutch has been less extensively studied than English, the quality of the embedding models may differ, potentially affecting their performance\. As such, semantic association as estimated by the different implementations in this study should be validated on other corpora to determine whether it is possible to replicate previously reported effects of semantic association\.

The most prominent finding of this study lies in the importance of the embedding model for estimating semantic association\. Only one word embedding model and one sentence embedding model were included in the analysis, as initial explorations indicated minimal differences between models within each embedding type\. However, in light of the results of the current study, further exploration of different embedding models would be interesting\.

## 5\. Conclusion

This study examined the effects ofLMembedding\-based semantic association on the self\-paced reading of medium\-length, Dutch texts\. The findings demonstrate that the conclusions critically depend on how semantic association is implemented, particularly with respect to the embedding model\. While uncontextualized word embeddings \(e\.g\., word2vec\) have previously been used to examine semantic association in sentence processing and showed effects on the N400\(Xuet al\.,[2024](https://arxiv.org/html/2606.07066#bib.bib6); Brodericket al\.,[2018](https://arxiv.org/html/2606.07066#bib.bib8); Frank and Willems,[2017](https://arxiv.org/html/2606.07066#bib.bib17)\), we observed no effects on either the N400 or self\-paced reading times\. In contrast, semantic association estimated with sentence embeddings was found to be predictive of processing difficulty\. These results suggest sentence embeddings to be a promising approach for examining the effects of semantic association in natural reading\.

## 6\. Code availability

## 7\. Ethics Statement

The study utilized human data fromthe Tilburg corpus of Natural Dutch Texts\(TiNT\) collected byØstergaardet al\.\([2025](https://arxiv.org/html/2606.07066#bib.bib51)\)\. The data has received an ethics approval and is licensed under a CC\-BY\-NC\-SA license\.

## 8\. Bibliographical References

- The P600 as a continuous index of integration effort\.Psychophysiology60\(9\),pp\. e14302\.External Links:ISSN 1469\-8986,[Document](https://dx.doi.org/10.1111/psyp.14302)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p4.1)\.
- N\. Banar, E\. Lotfi, J\. V\. Nooten, C\. Arhiliuc, M\. Kliocaite, and W\. Daelemans \(2025\)MTEB\-nl and e5\-nl: embedding benchmark and models for dutch\.External Links:2509\.12340,[Link](https://arxiv.org/abs/2509.12340)Cited by:[Table 2](https://arxiv.org/html/2606.07066#A1.T2.2.3.2.1.1.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p5.1)\.
- P\. Bojanowski, E\. Grave, A\. Joulin, and T\. Mikolov \(2017\)Enriching Word Vectors with Subword Information\.Transactions of the Association for Computational Linguistics5,pp\. 135–146\.External Links:[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00051)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- M\. P\. Broderick, A\. J\. Anderson, G\. M\. D\. Liberto, M\. J\. Crosse, and E\. C\. Lalor \(2018\)Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech\.Current Biology28\(5\),pp\. 803–809\.e3\.External Links:ISSN 0960\-9822,[Document](https://dx.doi.org/10.1016/j.cub.2018.01.080)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p4.1),[§3](https://arxiv.org/html/2606.07066#S3.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p1.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p4.1),[§5](https://arxiv.org/html/2606.07066#S5.p1.1)\.
- H\. Brouwer, H\. Fitz, and J\. Hoeks \(2012\)Getting real about Semantic Illusions: Rethinking the functional role of the P600 in language comprehension\.Brain Research1446,pp\. 127–143\.External Links:ISSN 00068993,[Document](https://dx.doi.org/10.1016/j.brainres.2012.01.055)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p3.1)\.
- N\. Z\. Bulkes, K\. Christianson, and D\. Tanner \(2020\)Semantic constraint, reading control, and the granularity of form\-based expectations during semantic processing: Evidence from ERPs\.Neuropsychologia137,pp\. 107294\.External Links:ISSN 00283932,[Document](https://dx.doi.org/10.1016/j.neuropsychologia.2019.107294)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1)\.
- P\. Bürkner \(2017\)Brms: an r package for bayesian multilevel models using stan\.Journal of Statistical Software80\(1\),pp\. 1–28\.External Links:[Link](https://www.jstatsoft.org/index.php/jss/article/view/v080i01),[Document](https://dx.doi.org/10.18637/jss.v080.i01)Cited by:[§2\.3](https://arxiv.org/html/2606.07066#S2.SS3.p1.1)\.
- W\. Chow, E\. Lau, S\. Wang, and C\. Phillips \(2018\)Wait a second\! delayed impact of argument roles on on\-line verb prediction\.Language, Cognition and Neuroscience33\(7\),pp\. 803–828\.External Links:ISSN 2327\-3798,[Document](https://dx.doi.org/10.1080/23273798.2018.1427878)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2)\.
- M\. Dambacher, R\. Kliegl, M\. Hofmann, and A\. M\. Jacobs \(2006\)Frequency and predictability effects on event\-related potentials during reading\.Brain Research1084\(1\),pp\. 89–103\.External Links:ISSN 0006\-8993,[Document](https://dx.doi.org/10.1016/j.brainres.2006.02.010)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: Pre\-training of Deep Bidirectional Transformers for Language Understanding\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Minneapolis, Minnesota,pp\. 4171–4186\.External Links:[Document](https://dx.doi.org/10.18653/v1/N19-1423)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- J\. M\. Dickey and B\. P\. Lientz \(1970\)The Weighted Likelihood Ratio, Sharp Hypotheses about Chances, the Order of a Markov Chain\.The Annals of Mathematical Statistics41\(1\),pp\. 214–226\.External Links:ISSN 0003\-4851, 2168\-8990,[Document](https://dx.doi.org/10.1214/aoms/1177697203)Cited by:[§2\.3](https://arxiv.org/html/2606.07066#S2.SS3.p6.3)\.
- S\. F\. Ehrlich and K\. Rayner \(1981\)Contextual effects on word perception and eye movements during reading\.Journal of Verbal Learning and Verbal Behavior20\(6\),pp\. 641–655\.External Links:ISSN 0022\-5371,[Document](https://dx.doi.org/10.1016/S0022-5371%2881%2990220-6)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p1.1),[§1](https://arxiv.org/html/2606.07066#S1.p5.1)\.
- K\. Enevoldsen, I\. Chung, I\. Kerboua, M\. Kardos, A\. Mathur, D\. Stap, J\. Gala, W\. Siblini, D\. Krzemiński, G\. I\. Winata, S\. Sturua, S\. Utpala, M\. Ciancone, M\. Schaeffer, G\. Sequeira, D\. Misra, S\. Dhakal, J\. Rystrøm, R\. Solomatin, Ö\. Çağatan, A\. Kundu, M\. Bernstorff, S\. Xiao, A\. Sukhlecha, B\. Pahwa, R\. Poświata, K\. K\. GV, S\. Ashraf, D\. Auras, B\. Plüster, J\. P\. Harries, L\. Magne, I\. Mohr, M\. Hendriksen, D\. Zhu, H\. Gisserot\-Boukhlef, T\. Aarsen, J\. Kostkan, K\. Wojtasik, T\. Lee, M\. Šuppa, C\. Zhang, R\. Rocca, M\. Hamdy, A\. Michail, J\. Yang, M\. Faysse, A\. Vatolin, N\. Thakur, M\. Dey, D\. Vasani, P\. Chitale, S\. Tedeschi, N\. Tai, A\. Snegirev, M\. Günther, M\. Xia, W\. Shi, X\. H\. Lù, J\. Clive, G\. Krishnakumar, A\. Maksimova, S\. Wehrli, M\. Tikhonova, H\. Panchal, A\. Abramov, M\. Ostendorff, Z\. Liu, S\. Clematide, L\. J\. Miranda, A\. Fenogenova, G\. Song, R\. B\. Safi, W\. Li, A\. Borghini, F\. Cassano, H\. Su, J\. Lin, H\. Yen, L\. Hansen, S\. Hooker, C\. Xiao, V\. Adlakha, O\. Weller, S\. Reddy, and N\. Muennighoff \(2025\)MMTEB: massive multilingual text embedding benchmark\.arXiv preprint arXiv:2502\.13595\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2502.13595),[Link](https://arxiv.org/abs/2502.13595)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p5.1)\.
- A\. Ettinger, N\. Feldman, P\. Resnik, and C\. Phillips \(2016\)Modeling N400 amplitude using vector space models of word representation\.Proceedings of the Annual Meeting of the Cognitive Science Society38\(0\)\.Cited by:[§A\.3](https://arxiv.org/html/2606.07066#A1.SS3.p1.1),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p4.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p4.1)\.
- K\. D\. Federmeier and M\. Kutas \(1999\)A Rose by Any Other Name: Long\-Term Memory Structure and Sentence Processing\.Journal of Memory and Language41\(4\),pp\. 469–495\.External Links:ISSN 0749596X,[Document](https://dx.doi.org/10.1006/jmla.1999.2660)Cited by:[Figure 5](https://arxiv.org/html/2606.07066#A1.F5),[Figure 5](https://arxiv.org/html/2606.07066#A1.F5.3.2),[§A\.3](https://arxiv.org/html/2606.07066#A1.SS3.p1.1),[§1](https://arxiv.org/html/2606.07066#S1.p1.1),[§1](https://arxiv.org/html/2606.07066#S1.p3.1),[§1](https://arxiv.org/html/2606.07066#S1.p5.1),[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§3](https://arxiv.org/html/2606.07066#S3.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1)\.
- I\. Fischler, P\. A\. Bloom, D\. G\. Childers, S\. E\. Roucos, and N\. W\. Perry \(1983\)Brain potentials related to stages of sentence verification\.Psychophysiology20\(4\),pp\. 400–409\.External Links:ISSN 0048\-5772,[Document](https://dx.doi.org/10.1111/j.1469-8986.1983.tb00920.x)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§3](https://arxiv.org/html/2606.07066#S3.p2.1)\.
- S\. L\. Frank and A\. Aumeistere \(2024\)An eye\-tracking\-with\-EEG coregistration corpus of narrative sentences\.Language Resources and Evaluation58\(2\),pp\. 641–657\.External Links:ISSN 1574\-0218,[Document](https://dx.doi.org/10.1007/s10579-023-09684-x)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1),[§1](https://arxiv.org/html/2606.07066#S1.p5.1)\.
- S\. L\. Frank, L\. J\. Otten, G\. Galli, and G\. Vigliocco \(2015\)The ERP response to the amount of information conveyed by words in sentences\.Brain and Language140,pp\. 1–11\.External Links:ISSN 0093934X,[Document](https://dx.doi.org/10.1016/j.bandl.2014.10.006)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1),[§1](https://arxiv.org/html/2606.07066#S1.p5.1)\.
- S\. L\. Frank and R\. M\. Willems \(2017\)Word predictability and semantic similarity show distinct patterns of brain activity during language comprehension\.Language, Cognition and Neuroscience32\(9\),pp\. 1192–1203\.External Links:ISSN 2327\-3798,[Document](https://dx.doi.org/10.1080/23273798.2017.1323109)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p4.1),[§3](https://arxiv.org/html/2606.07066#S3.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p1.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p4.1),[§5](https://arxiv.org/html/2606.07066#S5.p1.1)\.
- S\. L\. Frank \(2017\)Word Embedding Distance Does not Predict Word Reading Time\.Proceedings of the Annual Meeting of the Cognitive Science Society39\(0\)\.Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1),[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p4.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p4.1)\.
- T\. Gao, X\. Yao, and D\. Chen \(2021\)SimCSE: Simple Contrastive Learning of Sentence Embeddings\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Online and Punta Cana, Dominican Republic,pp\. 6894–6910\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.552)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- J\. Hale \(2001\)A Probabilistic Earley Parser as a Psycholinguistic Model\.InSecond Meeting of the North American Chapter of the Association for Computational Linguistics,Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1)\.
- M\. Honnibal, I\. Montani, S\. Van Landeghem, and A\. Boyd \(2020\)spaCy: Industrial\-strength Natural Language Processing in Python\.External Links:[Document](https://dx.doi.org/10.5281/zenodo.1212303)Cited by:[Table 2](https://arxiv.org/html/2606.07066#A1.T2.2.8.7.1.1.1)\.
- B\. Krieger, H\. Brouwer, C\. Aurnhammer, and M\. W\. Crocker \(2024\)On the limits of LLM surprisal as functional Explanation of ERPs\.Proceedings of the Annual Meeting of the Cognitive Science Society46\(0\)\.Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p4.1)\.
- G\. R\. Kuperberg, T\. Sitnikova, D\. Caplan, and P\. J\. Holcomb \(2003\)Electrophysiological distinctions in processing conceptual relationships within simple sentences\.Brain Research\. Cognitive Brain Research17\(1\),pp\. 117–129\.External Links:ISSN 0926\-6410,[Document](https://dx.doi.org/10.1016/s0926-6410%2803%2900086-7)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p4.1),[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§3](https://arxiv.org/html/2606.07066#S3.p2.1)\.
- M\. Kutas and K\. D\. Federmeier \(2011\)Thirty Years and Counting: Finding Meaning in the N400 Component of the Event\-Related Brain Potential \(ERP\)\.Annual Review of Psychology62\(1\),pp\. 621–647\.External Links:ISSN 0066\-4308, 1545\-2085,[Document](https://dx.doi.org/10.1146/annurev.psych.093008.131123)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p1.1),[§1](https://arxiv.org/html/2606.07066#S1.p3.1),[§1](https://arxiv.org/html/2606.07066#S1.p5.1)\.
- L\. Li, L\. Song, S\. Ding, B\. C\. M\. Fung, and P\. Charland \(2025\)Transforming Generic Coder LLMs to Effective Binary Code Embedding Models for Similarity Detection\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems,Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- S\. G\. Luke and K\. Christianson \(2018\)The Provo Corpus: A large eye\-tracking corpus with predictability norms\.Behavior Research Methods50\(2\),pp\. 826–833\.External Links:ISSN 1554\-3528,[Document](https://dx.doi.org/10.3758/s13428-017-0908-4)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1)\.
- H\. Mechtenberg, J\. Reilly, E\. B\. Myers, and J\. E\. Peelle \(2025\)Measuring brain sensitivity to semantic distance in spoken narrative comprehension\.PsyArXiv\.External Links:[Document](https://dx.doi.org/10.31234/osf.io/dtnjm%5Fv1)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1)\.
- J\. A\. Michaelov, M\. D\. Bardolph, C\. K\. Van Petten, B\. K\. Bergen, and S\. Coulson \(2024\)Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects\.Neurobiology of Language5\(1\),pp\. 107–135\.External Links:ISSN 2641\-4368,[Document](https://dx.doi.org/10.1162/nol%5Fa%5F00105)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p4.1)\.
- J\. A\. Michaelov and B\. K\. Bergen \(2024\)On the Mathematical Relationship Between Contextual Probability and N400 Amplitude\.Open Mind8,pp\. 859–897\.External Links:ISSN 2470\-2986,[Document](https://dx.doi.org/10.1162/opmi%5Fa%5F00150)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p4.1)\.
- T\. Mikolov, K\. Chen, G\. Corrado, and J\. Dean \(2013\)Efficient Estimation of Word Representations in Vector Space\.arXiv\.External Links:1301\.3781,[Document](https://dx.doi.org/10.48550/arXiv.1301.3781)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- J\. Mitchell, M\. Lapata, V\. Demberg, and F\. Keller \(2010\)Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure\.InProceedings of the 48th Annual Meeting of the Association for Computational Linguistics,J\. Hajič, S\. Carberry, S\. Clark, and J\. Nivre \(Eds\.\),Uppsala, Sweden,pp\. 196–206\.Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2)\.
- B\. Nicenboim, D\. J\. Schad, and S\. Vasishth \(2025\)The influence of priors: sensitivity analysis\.InIntroduction to Bayesian Data Analysis for Cognitive Science,pp\. 634\.Cited by:[§2\.3](https://arxiv.org/html/2606.07066#S2.SS3.p4.1)\.
- B\. Nicenboim and S\. Vasishth \(2016\)Statistical methods for linguistic research: Foundational Ideas—Part II\.Language and Linguistics Compass10\(11\),pp\. 591–613\.External Links:ISSN 1749\-818X,[Document](https://dx.doi.org/10.1111/lnc3.12207)Cited by:[§2\.3](https://arxiv.org/html/2606.07066#S2.SS3.p9.5)\.
- M\. S\. Nieuwland and J\. J\. A\. Van Berkum \(2005\)Testing the limits of the semantic illusion phenomenon: ERPs reveal temporary semantic change deafness in discourse comprehension\.Cognitive Brain Research24\(3\),pp\. 691–701\.External Links:ISSN 0926\-6410,[Document](https://dx.doi.org/10.1016/j.cogbrainres.2005.04.003)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p4.1)\.
- OpenAI \(2025\)GPT\-5\.2 via OpenAI API\.External Links:[Link](https://developers.openai.com/api/docs/models/gpt-5.2)Cited by:[§A\.3](https://arxiv.org/html/2606.07066#A1.SS3.p1.1)\.
- S\. M\. Østergaard, L\. Lichtenberg, L\. Boon, and B\. Nicenboim \(2025\)A Corpus of Joint EEG and Self\-Paced Reading of Natural Dutch Texts\.PsyArXiv\.External Links:[Document](https://dx.doi.org/10.31234/osf.io/g32rp%5Fv2)Cited by:[§2\.1](https://arxiv.org/html/2606.07066#S2.SS1.p1.1),[§2\.1](https://arxiv.org/html/2606.07066#S2.SS1.p2.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p4.1),[§2\.3](https://arxiv.org/html/2606.07066#S2.SS3.p2.4),[§4](https://arxiv.org/html/2606.07066#S4.p5.1),[§7](https://arxiv.org/html/2606.07066#S7.p1.1)\.
- M\. Parviz, M\. Johnson, B\. Johnson, and J\. Brock \(2011\)Using Language Models and Latent Semantic Analysis to Characterise the N400m Neural Response\.InProceedings of the Australasian Language Technology Association Workshop 2011,D\. Molla and D\. Martinez \(Eds\.\),Canberra, Australia,pp\. 38–46\.Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p7.1)\.
- J\. Pennington, R\. Socher, and C\. Manning \(2014\)GloVe: Global Vectors for Word Representation\.InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),A\. Moschitti, B\. Pang, and W\. Daelemans \(Eds\.\),Doha, Qatar,pp\. 1532–1543\.External Links:[Document](https://dx.doi.org/10.3115/v1/D14-1162)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- T\. Pimentel, C\. Meister, E\. G\. Wilcox, R\. P\. Levy, and R\. Cotterell \(2023\)On the Effect of Anticipation on Reading Times\.Transactions of the Association for Computational Linguistics11,pp\. 1624–1642\.External Links:ISSN 2307\-387X,[Link](https://doi.org/10.1162/tacl_a_00603),[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00603)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1),[§1](https://arxiv.org/html/2606.07066#S1.p5.1)\.
- J\. Pynte, B\. New, and A\. Kennedy \(2008\)On\-line contextual influences during reading normal text: A multiple\-regression analysis\.Vision Research48\(21\),pp\. 2172–2183\.External Links:ISSN 0042\-6989,[Document](https://dx.doi.org/10.1016/j.visres.2008.02.004)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2)\.
- R Core Team \(2024\)R: a language and environment for statistical computing\.R Foundation for Statistical Computing,Vienna, Austria\.External Links:[Link](https://www.r-project.org/)Cited by:[§2\.3](https://arxiv.org/html/2606.07066#S2.SS3.p1.1)\.
- A\. Radford, K\. Narasimhan, T\. Salimans, and I\. Sutskever \(2018\)Improving Language Understanding by Generative Pre\-Training\.OpenAI Technical Report\.Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- N\. Reimers and I\. Gurevych \(2019\)Sentence\-BERT: Sentence Embeddings using Siamese BERT\-Networks\.arXiv\.External Links:1908\.10084,[Document](https://dx.doi.org/10.48550/arXiv.1908.10084)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p1.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- L\. Salicchi and Y\. Hsu \(2025\)Different Reading Processing Stages or Different Brain Areas? A Computational Cognitive Investigation on N400, P600, and PNP\.InComputational Psycholinguistics Meeting 2025,Note:Conference presentation abstractExternal Links:[Link](https://openreview.net/forum?id=nu7Ld3AXWL)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2)\.
- C\. Shain \(2024\)Word Frequency and Predictability Dissociate in Naturalistic Reading\.Open Mind8,pp\. 177–201\.External Links:ISSN 2470\-2986,[Document](https://dx.doi.org/10.1162/opmi%5Fa%5F00119)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p5.1)\.
- Stan Development Team \(2023\)Stan Reference Manual\.External Links:[Link](https://mc-stan.org/)Cited by:[§2\.3](https://arxiv.org/html/2606.07066#S2.SS3.p1.1)\.
- K\. Stone and M\. Rabovsky \(2025\)The Role of Syntactic and Semantic Cues in Preventing Temporary Illusions of Plausibility\.Journal of Cognitive Neuroscience37\(9\),pp\. 1535–1561\.External Links:ISSN 0898\-929X,[Document](https://dx.doi.org/10.1162/jocn%5Fa%5F02320)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p4.1),[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1)\.
- H\. Touvron, T\. Lavril, G\. Izacard, X\. Martinet, M\. Lachaux, T\. Lacroix, B\. Rozière, N\. Goyal, E\. Hambro, F\. Azhar, A\. Rodriguez, A\. Joulin, E\. Grave, and G\. Lample \(2023\)LLaMA: Open and Efficient Foundation Language Models\.arXiv\.External Links:2302\.13971,[Document](https://dx.doi.org/10.48550/arXiv.2302.13971)Cited by:[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p2.1)\.
- M\. J\. Traxler, D\. J\. Foss, R\. E\. Seely, B\. Kaup, and R\. K\. Morris \(2000\)Priming in Sentence Processing: Intralexical Spreading Activation, Schemas, and Situation Models\.Journal of Psycholinguistic Research29\(6\),pp\. 581–595\.External Links:ISSN 1573\-6555,[Document](https://dx.doi.org/10.1023/A%3A1026416225168)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p6.2)\.
- L\. Wang, N\. Yang, X\. Huang, L\. Yang, R\. Majumder, and F\. Wei \(2024\)Multilingual e5 text embeddings: a technical report\.arXiv preprint arXiv:2402\.05672\.Cited by:[Table 2](https://arxiv.org/html/2606.07066#A1.T2.2.7.6.1.1.1)\.
- R\. Wong, E\. D\. Reichle, and A\. Veldre \(2024\)Prediction in reading: A review of predictability effects, their theoretical implications, and beyond\.Psychonomic Bulletin & Review\.External Links:ISSN 1531\-5320,[Document](https://dx.doi.org/10.3758/s13423-024-02588-z)Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p1.1)\.
- H\. Xu, M\. Nakanishi, and S\. Coulson \(2024\)Revisiting Joke Comprehension with Surprisal and Contextual Similarity: Implication from N400 and P600 Components\.Proceedings of the Annual Meeting of the Cognitive Science Society46\(0\)\.Cited by:[§1](https://arxiv.org/html/2606.07066#S1.p2.1),[§1](https://arxiv.org/html/2606.07066#S1.p6.2),[§1](https://arxiv.org/html/2606.07066#S1.p7.1),[§1](https://arxiv.org/html/2606.07066#S1.p8.1),[§3](https://arxiv.org/html/2606.07066#S3.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p1.1),[§4](https://arxiv.org/html/2606.07066#S4.p2.1),[§4](https://arxiv.org/html/2606.07066#S4.p4.1),[§5](https://arxiv.org/html/2606.07066#S5.p1.1)\.
- I\. Yamada, A\. Asai, J\. Sakuma, H\. Shindo, H\. Takeda, Y\. Takefuji, and Y\. Matsumoto \(2020\)Wikipedia2Vec: an efficient toolkit for learning and visualizing the embeddings of words and entities from Wikipedia\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations,pp\. 23–30\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-demos.4)Cited by:[Table 2](https://arxiv.org/html/2606.07066#A1.T2.2.2.1.1.1.1),[Table 2](https://arxiv.org/html/2606.07066#A1.T2.2.4.3.1.1.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p1.1),[§2\.2](https://arxiv.org/html/2606.07066#S2.SS2.p4.1)\.

## Appendix AAppendices

### A\.1\. Hugging Face References and Revision

Table 2:Overview of Hugging Face models used for the analysis\.
### A\.2\. Data loss

The data loss for the different implementations of semantic association, i\.e\., the number of words in the corpus for which semantic association could not be calculated\. The use of sentence embeddings resulted in a lower data loss compared to the word embedding implementations, as word embeddings extracted from a word2vec model only exist for a finite number of words\. The data loss is largest for theCWE, Windowed\(N=1\)\.

Table 3:Data loss caused by the different implementations of semantic association\. The table shows the overall data loss across all words in the corpus, and the data loss for the current analysis that only considers content words\.
### A\.3\. Validation of semantic association

To validate the implementations of semantic association, we used data fromFedermeier and Kutas \([1999](https://arxiv.org/html/2606.07066#bib.bib1)\)\. As seen inEttingeret al\.\([2016](https://arxiv.org/html/2606.07066#bib.bib10)\), we wanted to validate that the model identified expected targets as more semantically similar to the context compared to the within\-category and between\-category target words\. In addition to the original stimuli, we added an unrelated target word \(identical to an expected target in another context\)\. Moreover, using OpenAI’s GPT\-5\.2\(OpenAI,[2025](https://arxiv.org/html/2606.07066#bib.bib62)\), we generated longer contexts of approx\. 100 words to validate the models on contexts longer than the two\-sentence context provided in the original data\. The results of the validation are shown in Figure[5](https://arxiv.org/html/2606.07066#A1.F5)\. All the implementations identify the correct ordering of semantic association between the target words and the contexts\. The sentence embedding models,all\-MiniLM\-L6\-v2andintfloat/multilingual\-e5\-large, generally exhibit less variance overlap between target words \(and most notably between the expected and unexpected target\) as compared to the word embedding models,enwiki\_20180420\_100dandword2vec\-google\-news\-300\.

![Refer to caption](https://arxiv.org/html/2606.07066v1/figs/sem_association_all.png)Figure 5:Average semantic association between context and target words \(expected, within, between, and unexpected\) fromFedermeier and Kutas \([1999](https://arxiv.org/html/2606.07066#bib.bib1)\)given different implementations of semantic association\. The plot is divided by the embedding model and the implementation of semantic association\. The x\-axis indicates the context the target was associated with, where “long” means the original and the generated longer context, “original” means the original context, “5 sentences” means the two sentences in the original context and three more from the longer context\. The error bars indicate the standard deviation\. Note that the y\-axes are different across plots\.

Similar Articles

Probing for Reading Times

arXiv cs.CL

Researchers probe language model representations to predict human reading times across five languages, finding early layers outperform surprisal for early-pass measures while surprisal remains superior for late-pass measures.

Decomposing and Steering Functional Metacognition in Large Language Models

arXiv cs.CL

This research paper investigates functional metacognition in Large Language Models, demonstrating that internal states like evaluation awareness and self-assessed capability are linearly decodable from residual stream activations. The authors propose a mechanistic framework to steer these states, showing causal control over reasoning behaviors, verbosity, and safety responses.