Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models
Summary
This paper investigates how humans communicate under strict vocabulary limitations, comparing their incremental production strategies to greedy and globally optimal sampling algorithms using Sequential Monte Carlo inference with large language models.
View Cached Full Text
Cached at: 05/18/26, 06:30 AM
# Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models
Source: [https://arxiv.org/html/2605.15365](https://arxiv.org/html/2605.15365)
Thomas Hikaru Clark1,Sihan Chen1,Laura Nicolae2 1Department of Brain and Cognitive Sciences, MIT2Department of Economics, Harvard University \{thclark, sihanc\}@mit\.edu, lauranicolae@g\.harvard\.edu
###### Abstract
Communicating using only a limited vocabulary is a common but challenging cognitive phenomenon, requiring an ideal communicator to plan carefully to optimize for intelligibility while circumventing a constrained lexicon\. In this work, we investigate how humans respond to a broad array of questions under variable vocabulary limitations, consisting of only 250 highly frequent words at the most restrictive\. We provide theoretically motivated comparisons to greedy and globally optimal sampling algorithms using Sequential Monte Carlo inference with large language models\. Humans generally resemble greedy sampling more than globally optimal sampling, though more skilled humans are more likely to backtrack and revise – a non\-greedy behavior\. An observed human pattern of leaning on semantically light words in high\-constraint settings falls out of both greedy and globally optimal sampling\. We discuss the results and their broader implications for resource\-rational cognition, psycholinguistics, L2 communication, and language impairments\.
Keywords:Psycholinguistics; Large Language Models; Explanations; Planning; Resource\-Rationality
## Introduction
Humans use language to communicate\\@BBOPcitep\\@BAP\\@BBN\(Fedorenkoet al\.,[2024](https://arxiv.org/html/2605.15365#bib.bib23)\)\\@BBCP\. Using insights from information theory\\@BBOPcitep\\@BAP\\@BBN\(Shannon,[1948](https://arxiv.org/html/2605.15365#bib.bib69)\)\\@BBCP, researchers model language as communication code between a speaker and a listener: the speaker converts their meaning to utterances, and the listener receives the utterances and recovers the meaning\\@BBOPcitep\\@BAP\\@BBN\(see Gibsonet al\.,[2019](https://arxiv.org/html/2605.15365#bib.bib28), for a review\)\\@BBCP\. The production process is incremental: instead of starting with a full utterance in mind, a person typically plans what to say as they speak or write\\@BBOPcitep\\@BAP\\@BBN\(e\.g\. Bock and Levelt,[1994](https://arxiv.org/html/2605.15365#bib.bib6); Ferreira and Swets,[2002](https://arxiv.org/html/2605.15365#bib.bib81); Pickering and Garrod,[2013](https://arxiv.org/html/2605.15365#bib.bib64)\)\\@BBCP\. At the computational level,\\@BBOPcitet\\@BAP\\@BBNFutrell \([2023](https://arxiv.org/html/2605.15365#bib.bib26)\)\\@BBCPmodeled language production as an action planning problem in which a speaker’s incremental production is influenced both by a pressure to say contextually predictable things and a pressure to say things that further the communicative goal\. In this work, we specifically investigate how constraints on the available set of vocabulary words affect incremental language production\.
Communication under a vocabulary constraint is a common cognitive phenomenon\. Many societies contain linguistic minorities with limited knowledge of the majority language, which affects their ability to access basic services or participate in the economy\\@BBOPcitep\\@BAP\\@BBN\(Bleakley and Chin,[2004](https://arxiv.org/html/2605.15365#bib.bib5); Groggeret al\.,[2020](https://arxiv.org/html/2605.15365#bib.bib31)\)\\@BBCP\. Compared to native speakers, learners of a language \(L2 learners\) tend to use fewer and more frequent words\\@BBOPcitep\\@BAP\\@BBN\(Laufer,[1991](https://arxiv.org/html/2605.15365#bib.bib49)\)\\@BBCP\. When L2 learners do not know a specific word, they may employ various strategies to get a meaning across, such as describing key properties of the concept\\@BBOPcitep\\@BAP\\@BBN\(Poulisse,[2011](https://arxiv.org/html/2605.15365#bib.bib66)\)\\@BBCPor using “all\-purpose” or semantically light words as substitutes\\@BBOPcitep\\@BAP\\@BBN\(Dörnyei and Scott,[1997](https://arxiv.org/html/2605.15365#bib.bib21)\)\\@BBCP\. This phenomenon suggests that there may be diminishing marginal returns to learning new vocabulary words above a certain point, since a subset of the vocabulary is typically sufficient for communication\.
Another common case of vocabulary\-constrained communication is when an expert explains a complex phenomenon to a layperson\. Past work has considered what makes for a good explanation\\@BBOPcitep\\@BAP\\@BBN\(e\.g\. Breweret al\.,[2000](https://arxiv.org/html/2605.15365#bib.bib82); Cruz and Lombrozo,[2025](https://arxiv.org/html/2605.15365#bib.bib19); Chandraet al\.,[2024](https://arxiv.org/html/2605.15365#bib.bib11); McCarthy and Keil,[2023](https://arxiv.org/html/2605.15365#bib.bib61)\)\\@BBCP\. For example,\\@BBOPcitet\\@BAP\\@BBNSuliket al\.\([2023](https://arxiv.org/html/2605.15365#bib.bib71)\)\\@BBCPsuggests that a good explanation contains functional \(i\.e\., what is something for?\) or mechanistic \(i\.e\., how does something work?\) information\. Besides the type of information, the words used also matter\. For example, laypersons find explanations with jargon \(terms only understood by a specific group of experts\) harder to understand\\@BBOPcitep\\@BAP\\@BBN\(Bullocket al\.,[2019](https://arxiv.org/html/2605.15365#bib.bib86); Cruz and Lombrozo,[2025](https://arxiv.org/html/2605.15365#bib.bib19); Keuleerset al\.,[2015](https://arxiv.org/html/2605.15365#bib.bib42)\)\\@BBCP\. It remains unclear how exactly the usage of specific words changes in response to vocabulary constraints, and what kind of algorithmic\-level\\@BBOPcitep\\@BAP\\@BBN\(Marr,[1982](https://arxiv.org/html/2605.15365#bib.bib60)\)\\@BBCPmodel of language generation captures the properties of language generated under vocabulary constraints\.
In this work, we define a large space of communicative goals and evaluate the properties of English responses generated by humans under varying vocabulary constraints, comprising only the most common 250 English words at the most restrictive\. Recently\-developed techniques allow us to sample from a language model subject to custom constraints\\@BBOPcitep\\@BAP\\@BBN\(Lipkinet al\.,[2025](https://arxiv.org/html/2605.15365#bib.bib54); Loulaet al\.,[2025](https://arxiv.org/html/2605.15365#bib.bib57)\)\\@BBCP, and provide a new opportunity to test algorithmic\-level hypotheses about human cognition\. In particular, we can manipulate thegreedinessof constrained generation, where a purely greedy algorithm has a bias for locally high\-probability continuations, while approximately globally\-optimal inference algorithms like particle\-based Sequential Monte Carlo \(SMC\) avoid these biases and are thus more consistent with planning ahead during language production\. We investigate whether humans perform constrained language generation in a way that aligns more closely with greedy generation or globally\-optimal inference\.
To foreshadow our results: for the humans in our study, performance as a function of vocabulary size more closely resembled greedy sampling than SMC\-based constrained generation, which handled very small vocabularies better than humans or greedy sampling\. Despite this, we observe that top\-scoring humans revise valid string prefixes \(a non\-greedy behavior\) significantly more than low\-scoring humans\. This suggests that human approaches to constrained generation are heterogeneous: some plan ahead or revise their answers while others prioritize simple, greedy, and imperfect responses\. We also observe interpretable patterns in the frequency of word use under different vocabulary constraints: semantically “light” words, likedo, thing,andpeople, are used disproportionately frequently under strict vocabulary constraints, even relative to other allowed words\. This pattern is reproduced by constrained generation from language models\. We conclude by discussing the implications of these findings for both psycholinguistics and for broader areas of impact\.
## Methods
### Constrained vocabulary definition
We define a constrained vocabulary of sizeNN, consisting of theNNmost frequent words in English, according to the word list provided by thewordfreqPython package\\@BBOPcitep\\@BAP\\@BBN\(Speer,[2022](https://arxiv.org/html/2605.15365#bib.bib70)\)\\@BBCP\. For each word in the list, the set of words sharing the same base form \(lemma\) was also included in the list, using thelemminflectPython package, available withinspaCy\\@BBOPcitep\\@BAP\\@BBN\(Honnibalet al\.,[2020](https://arxiv.org/html/2605.15365#bib.bib37)\)\\@BBCP\. For example, if any of the word formsdrink, drank, drunk, drinks,ordrinkingwere present within the topNNwords, then all word forms in this set were included\. We define seven vocabulary sets, starting at 250 words and doubling up to 16,000 words\. For reference, it is estimated that the average native speaker of American English is familiar with 42,000 word lemmas from 11,000 word families\\@BBOPcitep\\@BAP\\@BBN\(Brysbaertet al\.,[2016](https://arxiv.org/html/2605.15365#bib.bib8)\)\\@BBCP\.
### Questions dataset
To simulate a variety of real\-world situations with diverse communicative goals, we assemble a dataset of 192 questions divided into the following four categories of 48 questions each: Why, How, ExplainSimple, and RedditELI5 \(Explain Like I’m 5\)\. The Why questions are sourced from a study by\\@BBOPcitet\\@BAP\\@BBNSuliket al\.\([2023](https://arxiv.org/html/2605.15365#bib.bib71)\)\\@BBCP, who investigated which features make for good explanations\. The How and ExplainSimple datasets were created from scratch using fixed sentence patterns \(“How is/are/do/does/can/would \[BLANK\]?” and “Explain \[BLANK\] in simple terms”\), with the goal of covering a wide range of communicative topics, including everyday life, sports, and science\. The RedditELI5 questions were sourced from the all\-time most popular questions on the Reddit forum “Explain Like I’m 5”, lightly edited for length and clarity\.
### Human behavioral experiment
We created an online behavioral experiment in which participants respond to questions from the above dataset, using an interface which allows them to type only words within a specified vocabulary\. While not a fully natural task, the interface simulates a speaker encountering vocabulary limitations, forcing on\-line adaptation to the constraint via circumlocution or other strategies\. Participants could only type or delete \(selecting, replacing, and inserting text was blocked\)\. We recruited 144 English speakers from Prolific, who were each paid $6\. The study was conducted under an approved IRB protocol at the authors’ institution\. Each participant answered 16 questions, beginning with 4 questions with no vocabulary constraints, then 4 questions each for vocabulary sizes of 4000, 1000, and 250\. Each question in the dataset was answered by three unique individuals at each vocabulary size\. Participants were prompted to move to the next question after 90 seconds without a submission\. We record participants’ final responses as well as all intermediate keystrokes\.
### Constrained LLM generation withAwrs
Adaptive Weighted Rejection Sampling\\@BBOPcitep\\@BAP\\@BBN\(Lipkinet al\.,[2025](https://arxiv.org/html/2605.15365#bib.bib54)\)\\@BBCPis a Sequential Monte Carlo method for sampling incrementally from a language model, subject to user\-defined constraints \(implemented as binary functions that can be evaluated on partially generated strings\)\. It does this by maintaining multiple hypotheses in parallel, represented by weightedparticles\. We useAwrsto generate responses to prompts, using a custom potential in theGenLM\-controllibrary to enforce a hard constraint on the set of words in the vocabulary\. Constrained generation was performed using theLlama\-3\.2\-1B\-Instructmodel, with either 16 or 32 particles for SMC inference \(denotedAwrs\-16andAwrs\-32, respectively\)\. By setting the number of particles to 1, the algorithm is equivalent to local greedy sampling, where only one hypothesis is maintained at each step of generation\. A prompt with instructions and two few\-shot examples was provided\. Example responses from both humans and models, under both a permissive and a restrictive constraint, are shown in[Table1](https://arxiv.org/html/2605.15365#Sx3.T1)\.
### Analyses
#### Automated evaluation of response quality
To create an estimate of the quality of responses to each question \(from both humans and models\), we use a prompted LLM \(Llama\-3\.1\-8B\-Instruct\) to assign scores on a 7\-point Likert scale, along with accompanying justifications, under an LLM\-as\-judge paradigm\\@BBOPcitep\\@BAP\\@BBN\(Guet al\.,[2025](https://arxiv.org/html/2605.15365#bib.bib32); Zhenget al\.,[2023](https://arxiv.org/html/2605.15365#bib.bib80)\)\\@BBCP\. For each question, we aim to approximate the value𝔼\[f\(X\)\]=∑xf\(X\)pX\(x\)dx\\mathbb\{E\}\[f\(X\)\]=\\sum\_\{x\}f\(X\)p\_\{X\}\(x\)dxwhereXXdenotes a random variable sampled from the distribution over constrained responses to a question,f\(⋅\)f\(\\cdot\)denotes the scoring function, andpXp\_\{X\}denotes the probability density function of X, approximated by the normalized SMC weights\. For computational efficiency, we discard samples from SMC with weights below a chosen threshold of 0\.01, which contribute minimally to the sum\. We use the same evaluation pipeline on human responses, treating each response as a single sample \(all equally weighted\)\. The automated evaluator is an imperfect proxy for response quality that was employed to address the large number of both model and human responses that required annotation, for which it was not practical to collect comprehensive human judgments\. While the absolute score assigned to human\- vs\. model\-generated utterances may be biased \(e\.g\. if LLM judges prefer LLM\-generated responses\), we are primarily concerned withchangesto the evaluation score as a function of vocabulary size\. We conducted a norming study to validate the LLM\-as\-judge pipeline: one third of questions \(64 out of 192\) were randomly sampled, and for each of four different vocabulary sizes, one response from humans, the greedy model, and theAwrs\-32model was randomly chosen and graded by N=24 human participants on Prolific, who each saw 32 question\-answer pairs \(users who participated in the constrained generation experiment were excluded\)\. Human graders were given the same instructions provided to the LLM\-as\-judge, rating the response quality on a 7\-point Likert scale\. For the subset of questions in the norming study, automated LLM\-generated ratings and human ratings had a Spearmanρ\\rhoof 0\.60, suggesting that the automated pipeline captures a substantial amount of variance in human ratings\.
We predict that the average response score will decrease monotonically as the allowed vocabulary is reduced, and we evaluate whether scores for human responses decline similarly to the greedy or SMC\-based model responses as vocabulary size is restricted\.
#### Shifts in word frequency
We compute the frequency of each word in the generated output as a function of vocabulary size\. Restricting the vocabulary necessarily removes low\-frequency words from the output distribution, but does notdirectlychange the relative frequencies of the remaining words\. For example, all remaining words might be used slightly more frequently to account for the words that were removed from the vocabulary, but it is not obvious that there would be any change in the usage rank of the remaining words\. However, we hypothesize that tightening the vocabulary constraint will systematically change the frequency distribution of used words, even among those that are allowed by the constraint\. In particular, words which have a high degree ofsubstitutabilitywith low\-frequency words will increase in usage rank relative to words with a low degree of substitutability\\@BBOPcitep\\@BAP\\@BBN\(Varian,[2010](https://arxiv.org/html/2605.15365#bib.bib85)\)\\@BBCP\. For example, in settings where low\-frequency words are not available, we may expect the word “thing” to be substituted instead\\@BBOPcitep\\@BAP\\@BBN\(Hasselgren,[1994](https://arxiv.org/html/2605.15365#bib.bib33); Klein and Perdue,[1997](https://arxiv.org/html/2605.15365#bib.bib43)\)\\@BBCP\. Meanwhile, function words, including prepositions, conjunctions, and determiners, may not see a large change in usage\.
#### Revision as an index of greediness vs\. planning
Under strictly greedy language generation, once a word is output, it can no longer be revised\. In language modeling terms, this feature of greedy sampling can lead to so\-called “dead ends,” where a sequence of high\-probability words leads to a context in which the only high\-probability continuation is outside of the vocabulary\\@BBOPcitep\\@BAP\\@BBN\(see Lewet al\.,[2023](https://arxiv.org/html/2605.15365#bib.bib52), for a discussion\)\\@BBCP\. For example, given the sequence “It was a blessing in…”, Google N\-grams estimates a 90% probability that the next word is “disguise”\. If this word is outside the vocabulary, a greedy sampler would be forced to choose a different, low\-probability continuation, which could rapidly lead to an incoherent sentence being generated\. A rational constrained language generator could avoid this problem by either planning ahead \(avoiding going down the dead end in the first place\) or by revising \(backtracking or deleting words once a dead end is detected\)\. It is difficult to directly measure whether a person is planning ahead\. However, given the fine\-grained typing data from our online task \([Figure1](https://arxiv.org/html/2605.15365#Sx2.F1)\), it is possible to see whether participants ever delete previously written words\. For a word to be counted as deleted, it must have first been completed \(e\.g\., by a whitespace or punctuation character\) and then erased by repeated presses of the backspace key\. We compute the average number of word deletions per response to quantify revisions\.
Figure 1:Example keystroke sequence under a 250\-word vocabulary, in response to “How do you rescue a cat from a tree?”\. Red characters denote disallowed keystrokes; the backspace symbol is displayed in orange\. The final response was “You have to get a thing to help you\.”\.
## Results
Table 1:Example responses at different vocabulary sizes\.### Effects of vocabulary size reveal similarity between humans and greedy sampling
Figure 2:Mean weighted score \(automated evaluation on a 1\-7 Likert scale\) as a function of allowed vocabulary size\. The horizontal axis is logarithmic\. Shaded bands denote 95% confidence intervals over the scores for all 192 questions\. The label ‘16000’ is mapped to the unconstrained condition for humans\.[Figure2](https://arxiv.org/html/2605.15365#Sx3.F2)shows the average evaluation scores of responses generated by humans and the three algorithms \(greedy,Awrs\-16, andAwrs\-32\) at each vocabulary size\. Exclusion criteria for participants were not pre\-defined, but we perform a post\-hoc split of participants into the top 50% and bottom 50% by average response score to separate out low\-effort responses\. In absolute terms, human responses scored lower, on average, than all models at each vocabulary size\. This may be influenced by the fact that human responses were generally shorter and less detailed than model responses\. All three algorithms and the human\-generated responses exhibit diminishing marginal returns: evaluation scores increase by nearly\-constant increments each time the vocabulary size is doubled, suggesting that the average marginal utility of adding a new word to the vocabulary declines as the vocabulary size expands\.
Tightening the vocabulary constraint produced similar decreases in the scores of the human responses and the greedy algorithm’s responses at all vocabulary sizes\. Meanwhile, theAwrsalgorithm displays greater robustness to vocabulary constraints compared to humans and the greedy algorithm: the evaluation scores ofAwrs\-generated responses declined more slowly than those of humans and the greedy algorithm, with generally larger gaps in performance for smaller vocabularies\. On the 250\-word vocabulary, theAwrsalgorithm achieves half a point better \(on a 7\-point scale\) than would be expected if its performance degraded at the same rate as human participants and greedy sampling\.
### A shift towards semantically light words in constrained generation
Consistent with our prediction, tighter vocabulary constraints induce responses with more frequent use of semantically light content words, such asthing, do,orpeople\. Both human\-generated and model\-generated responses used semantically light words more frequently as the vocabulary constraint became stricter \([Figure3](https://arxiv.org/html/2605.15365#Sx3.F3)\)\. In particular, semantically light content words tended to rise in frequency rank, while function words tended to remain constant or fall in rank\. We note that this qualitative pattern occurs for all model results, includingAwrs\-16andAwrs\-32\.


Figure 3:Bump plots for both models and humans showing word frequency rankings at largest and smallest vocabulary sizes\. The color scale denotes sign and magnitude of rank change\. Semantically light nouns, verbs, and adjectives \(e\.g\.,things,make, etc\.\) tend to rise in rank as vocabulary size shrinks, while connectives \(or,when\) tend to decrease in rank\.
### Skilled humans revise more under constraint
[Figure4](https://arxiv.org/html/2605.15365#Sx3.F4)shows the rate of the human respondents’ word deletions per response by vocabulary size and participant skill group \(a post\-hoc split of the participants into the top and bottom 50% by average score\)\. For participants in the bottom 50%, word deletions remained flat across different constrained vocabulary sizes\. In contrast, for participants in the top 50%, word deletions were significantly higher in the constrained vocabularies of 1000 and 250 words than in larger vocabularies\. This much\-greater propensity for revision among high\-scoring humans suggests an attempt at a non\-greedy approach, though these attempts may not be successful, as evidenced by the fact that the responses generated by this group of participants generally scored similarly to the greedy algorithm\.
Figure 4:Mean valid word deletions per response, by vocabulary size and post\-hoc performance group\. Error bars denote 95% confidence intervals\. Stars denote p\-values of Mann\-Whitney U\-test with Bonferroni correction \(\*:<<0\.05, \*\*:<<0\.01, \*\*\*:<<0\.001\)\.
## Discussion
### Resource\-rational inference in language production
An extensive body of work in cognitive science and behavioral economics suggests that humans are approximately rational subject to cognitive costs\\@BBOPcitep\\@BAP\\@BBN\(Kahneman and Tversky,[1979](https://arxiv.org/html/2605.15365#bib.bib41); Kahneman,[2003](https://arxiv.org/html/2605.15365#bib.bib40); Griffithset al\.,[2015](https://arxiv.org/html/2605.15365#bib.bib30); Lieder and Griffiths,[2020](https://arxiv.org/html/2605.15365#bib.bib53)\)\\@BBCP\. In real\-time communication, a pressure for communicative utility interacts with constraints on the availability of cognitive resources like time and attention\. Evidence also suggests that humans can adapt tonovelconstraints by deploying strategies that make efficient use of available resources: people asked to communicate about novel objects rapidly converge to a set of labels, using shorter phrases for more frequently observed items\\@BBOPcitep\\@BAP\\@BBN\(Krauss and Weinheimer,[1964](https://arxiv.org/html/2605.15365#bib.bib46)\)\\@BBCP\. Studies have also shown emergent systematicity in highly constrained communication channels when there was a communicative goal, e\.g\. whistling to signal a color on a continuum\\@BBOPcitep\\@BAP\\@BBN\(Chenet al\.,[2025](https://arxiv.org/html/2605.15365#bib.bib12)\)\\@BBCP\.
The intersection of language modeling and probabilistic inference provides an opportunity to compare human behavior to algorithmic\-level hypotheses about processing\. Sequential Monte Carlo has been argued to provide a cognitively plausible algorithm for approximate Bayesian inference for language comprehension\\@BBOPcitep\\@BAP\\@BBN\(Clarket al\.,[2025b](https://arxiv.org/html/2605.15365#bib.bib17),[a](https://arxiv.org/html/2605.15365#bib.bib18)\)\\@BBCP\. TheAWRSalgorithm enables efficientgenerationunder a vocabulary constraint\. This algorithm is inherently resource\-rational: it uses more samples when the constraint is difficult to satisfy than when it is easy, and varying the particle count results in a continuum of behavior from fast, greedy sampling to slower, more exact inference, which here resulted in higher\-scoring responses\.
Our results indicate that humans presented with the task of constrained generation show behavior that, somewhat surprisingly, more closely resembles greedy sampling than particle\-based SMC\. Above\-average participants, however, show a signature of non\-greedy sampling: they are much more likely to revise partially generated sentences\. To help explain these results, we note that greedy sampling is one endpoint of a resource\-rational trade\-off; it minimizes computational effort but maximizes local bias\. A plausible hypothesis is that humans are closer to the greedy extreme of this trade\-off \(possibly due to the high costs of revising or planning ahead\), but that highly\-motivated individuals may opt to increase effort to counter the dead ends induced by greedy sampling\. Future work can consider whether financial incentives, in\-person experimentation, or pairing participants together can induce more non\-greedy behavior by placing greater weight on communicative success\.
Our results also indicate that certain words are used disproportionately when vocabulary is artificially restricted\. In the most restricted vocabularies, semantically light content words jump substantially in usage rank\. These words have especially high communicative utility in constrained settings, because they can serve as substitutes for low\-frequency words \(consider the frequent use of the word “way” in[Table1](https://arxiv.org/html/2605.15365#Sx3.T1)\)\. While intended words may not have perfect substitutes, they can often be replaced by a longer phrase, using relative clauses or prepositional phrases to add detail\. This pattern is also seen in both greedy and non\-greedy model results, suggesting that it is a basic corollary of the substitutability of these words, rather than a result of active Bayesian inference\. These results are consistent with existing theoretical frameworks of built\-in redundancy in language providing robustness against failures of communication\\@BBOPcitep\\@BAP\\@BBN\(Tourtouriet al\.,[2021](https://arxiv.org/html/2605.15365#bib.bib72); Degenet al\.,[2020](https://arxiv.org/html/2605.15365#bib.bib20); Mahowaldet al\.,[2023](https://arxiv.org/html/2605.15365#bib.bib59); Leufkens,[2020](https://arxiv.org/html/2605.15365#bib.bib51)\)\\@BBCP\.
Another property of constrained generation with SMC is the sensitivity of inference quality and efficiency to the proposal distribution used to generate incremental next\-word candidates\. The greater the divergence between the proposal distribution and the target distribution, the more samples are needed to generate a word that meets the constraint; the algorithm works best when it does not have to reject too many next\-word candidates\\@BBOPcitep\\@BAP\\@BBN\(Lipkinet al\.,[2025](https://arxiv.org/html/2605.15365#bib.bib54)\)\\@BBCP\. We speculate that this phenomenon has a natural analog in human cognition\. There are two ways that a human can be good at constrained generation\. The first is to take many possible “samples”; we expect to see this reflected in slow processing and frequent deletions\. The second is to have a good “proposal distribution” that assigns high probability to continuations which already satisfy the constraints; we expect this to be reflected in fluent production with relatively few revisions\. Crucially, such a proposal distribution differs from the baseline distribution of language, and thus requires learning via experience\. As people gain more familiarity with this task, it may become more natural and require less explicit inference, in keeping with theories of amortized computation in cognition\\@BBOPcitep\\@BAP\\@BBN\(Gershman and Goodman,[2014](https://arxiv.org/html/2605.15365#bib.bib27)\)\\@BBCP\. Future work may consider whether humans with extensive L2 experience are better than monolinguals at this task\.
### Intrinsic differences in question difficulty
Are some questions simply harder to answer than others? We list the top and bottom questions by average evaluator score forAwrs\-32and for humans, when the vocabulary size was 250:
Top/Bottom AWRS\-32 Questions▲\\blacktriangleExplain Christianity in simple terms\.▲\\blacktriangleExplain machine learning in simple terms\.▲\\blacktriangleExplain consequentialism in simple terms\.▲\\blacktriangleExplain the placebo effect in simple terms\.▼\\blacktriangledownHow do I make schnitzel?▼\\blacktriangledownWhy does it echo if we yell in a cave but not a regular room?▼\\blacktriangledownHow do you remove salt from water?▼\\blacktriangledownHow do you prevent a sunburn?
Top/Bottom Human Questions▲\\blacktriangleExplain the placebo effect in simple terms\.▲\\blacktriangleHow do languages get new words?▲\\blacktriangleHow do maggots get into a place like a freezer that’s sealed air tight when it loses power?▲\\blacktriangleWhy are human babies born so helpless?▼\\blacktriangledownHow are books printed?▼\\blacktriangledownHow can I haggle prices at a market effectively?▼\\blacktriangledownExplain the International Space Station in simple terms\.▼\\blacktriangledownWhat is a hedge fund?
First, we observe that for both the model and humans, most of the lowest\-rated questions were in the How category\. This may be because How questions typically prompt mechanistic explanations, which are known to be hard to understand, even if the subject of the question is well\-known or technically simple\\@BBOPcitep\\@BAP\\@BBN\(e\.g\. McCarthy and Keil,[2023](https://arxiv.org/html/2605.15365#bib.bib61); Kelemenet al\.,[2013](https://arxiv.org/html/2605.15365#bib.bib83); Lombrozo and Carey,[2006](https://arxiv.org/html/2605.15365#bib.bib55); Lombrozo and Wilkenfeld,[2019](https://arxiv.org/html/2605.15365#bib.bib84)\)\\@BBCP\. Additionally, these questions might involve lexicalized concepts that are difficult to explain via circumlocution\. For example, explaining how to make schnitzel is difficult without the wordpork, which is not among the most common 250 English words\. In contrast, the highest\-rated questions likely present a communicator with alternative ways of expressing the idea, even if the question appears to be quite technical on the surface\. For example, the concept of the placebo effect can be expressed by words such asfeelandwork, both of which are among the 250 most frequent words\. The highest\-scoring questions forAwrs\-32all belong to the ExplainSimple dataset; this is possibly because the questions explicitly prompt the model to explain “in simple terms”, which reduces the divergence between the proposal distribution and target constrained distribution\. These results suggest that whether a communicative goal is easy or hard to achieve under vocabulary constraints likely depends greatly on the availability of substitute words and well\-suited analogies, which varies widely even within a question category, and not necessarily on the “technical” difficulty of the subject matter\.
### Broader implications
One key application of our work is for language learning\. Our finding that semantically light words appear more frequently when the allowed vocabulary is small could provide useful guiding principles for optimizing language learning\. For example, our results suggest that a foreign\-language learner who expects to learn only 250 – 1000 words in the target language should prioritize learning general\-purpose, semantically light words \(not necessarily the most frequent words\) in order to be able to communicate effectively across a wide variety of everyday situations\. In contrast, an intermediate\-level speaker who hopes to sound more “native” might consider practicing communication without over\-using semantically light words such asthing, do,orpeople, which occur less frequently in the unconstrained distribution of language\.
Our results on the diminishing marginal returns of expanding one’s vocabulary size also suggest that learning the first few hundred or thousand most frequent English words has disproportionately high returns for communicative utility \(especially in the presence of a charitable comprehender\), despite still being far below the vocabulary size of a native speaker\. This could potentially explain why L2 learners often converge to an efficient but limited “Basic Variety” of a language\\@BBOPcitep\\@BAP\\@BBN\(Klein and Perdue,[1997](https://arxiv.org/html/2605.15365#bib.bib43)\)\\@BBCPor over\-use familiar words\\@BBOPcitep\\@BAP\\@BBN\(Hasselgren,[1994](https://arxiv.org/html/2605.15365#bib.bib33)\)\\@BBCP: if one can communicate well using a restricted vocabulary, then there are fewer pressures to expand one’s vocabulary\.
Additionally, modeling vocabulary\-constrained communication has relevance for the study of language disorders\. For example, communication for individuals with aphasia may benefit from flexible inference both in language production \(choosing which words to produce when production is very difficult\) and comprehension \(how an interlocutor infers an intended meaning from fragmented utterances\)\\@BBOPcitep\\@BAP\\@BBN\(Beeke,[2012](https://arxiv.org/html/2605.15365#bib.bib4); Fedorenkoet al\.,[2022](https://arxiv.org/html/2605.15365#bib.bib22)\)\\@BBCP\. Patients with aphasia may be less likely to use low\-frequency words likesailboatwhile being more likely to use syntactically rich fragments likeboat that is moved by the windto circumvent this limitation\\@BBOPcitep\\@BAP\\@BBN\(Rezaiiet al\.,[2022](https://arxiv.org/html/2605.15365#bib.bib67)\)\\@BBCP\. Modeling how people generate language under constraint can shed light on the rapid, online inferences made by conversational partners in the presence of language disorders, and can potentially inform strategies and technologies for easing communication with vulnerable groups \(for example, by modifying existing language models to better infer intended messages from atypical inputs\)\.
This study has investigated how vocabulary constraints affect human\-generated language, with theoretically\-motivated comparisons to greedy and non\-greedy sampling from language models using state\-of\-the\-art algorithms\. Our results demonstrate both the flexibility and the limitations of human language generated under vocabulary constraints, and provide an approach for computationally characterizing its properties\.
## References
- 13\. Aphasia: The pragmatics of everyday conversation\.In13\. Aphasia: The Pragmatics of Everyday Conversation,pp\. 345–372\.External Links:[Document](https://dx.doi.org/10.1515/9783110214215.345)Cited by:[Broader implications](https://arxiv.org/html/2605.15365#Sx4.SSx3.p3.1)\.
- H\. Bleakley and A\. Chin \(2004\)Language Skills and Earnings: Evidence from Childhood Immigrants\*\.The Review of Economics and Statistics86\(2\),pp\. 481–496\.External Links:ISSN 0034\-6535,[Document](https://dx.doi.org/10.1162/003465304323031067)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p2.1)\.
- K\. Bock and W\. Levelt \(1994\)Language production: Grammatical encoding\.InHandbook of Psycholinguistics,pp\. 945–984\.Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p1.1)\.
- W\. F\. Brewer, C\. A\. Chinn, and A\. Samarapungavan \(2000\)Explanation in scientists and children\.InExplanation and cognition,pp\. 279–298\.External Links:ISBN 978\-0\-262\-11249\-9Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1)\.
- M\. Brysbaert, M\. Stevens, P\. Mandera, and E\. Keuleers \(2016\)How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age\.Frontiers in Psychology7,pp\. 1116\.External Links:ISSN 1664\-1078,[Document](https://dx.doi.org/10.3389/fpsyg.2016.01116)Cited by:[Constrained vocabulary definition](https://arxiv.org/html/2605.15365#Sx2.SSx1.p1.3)\.
- O\. M\. Bullock, D\. Colón Amill, H\. C\. Shulman, and G\. N\. Dixon \(2019\)Jargon as a barrier to effective science communication: Evidence from metacognition\.Public Understanding of Science28\(7\),pp\. 845–853\(EN\)\.External Links:ISSN 0963\-6625,[Link](https://doi.org/10.1177/0963662519865687),[Document](https://dx.doi.org/10.1177/0963662519865687)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1)\.
- K\. Chandra, T\. Chen, T\. Li, J\. Ragan\-Kelley, and J\. Tenenbaum \(2024\)Cooperative explanation as rational communication\.InProceedings of the Annual Meeting of the Cognitive Science Society,Vol\.46\.Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1)\.
- A\. M\. Chen, M\. Hofer, M\. Poliak, R\. Levy, and N\. Zaslavsky \(2025\)Discrete and systematic communication in a continuous signal\-meaning space\.Journal of Language Evolution10\(1\),pp\. lzaf003\.External Links:ISSN 2058\-458X,[Document](https://dx.doi.org/10.1093/jole/lzaf003)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p1.1)\.
- T\. H\. Clark, J\. H\. Vigly, E\. Gibson, and R\. P\. Levy \(2025a\)Resource\-Rational Noisy\-Channel Language Processing: Testing the Effect of Algorithmic Constraints on Inferences\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,C\. Christodoulopoulos, T\. Chakraborty, C\. Rose, and V\. Peng \(Eds\.\),Suzhou, China,pp\. 23659–23672\.Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p2.1)\.
- T\. H\. Clark, J\. H\. Vigly, E\. Gibson, and R\. Levy \(2025b\)A Model of Approximate and Incremental Noisy\-Channel Language Processing\.Proceedings of the Annual Meeting of the Cognitive Science Society47\(0\)\.Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p2.1)\.
- F\. Cruz and T\. Lombrozo \(2025\)How laypeople evaluate scientific explanations containing jargon\.Nature Human Behaviour9\(10\),pp\. 2038–2053\.External Links:ISSN 2397\-3374,[Document](https://dx.doi.org/10.1038/s41562-025-02227-0)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1)\.
- J\. Degen, R\. D\. Hawkins, C\. Graf, E\. Kreiss, and N\. D\. Goodman \(2020\)When redundancy is useful: A Bayesian approach to ”overinformative” referring expressions\.Psychological Review127\(4\),pp\. 591–621\.External Links:ISSN 1939\-1471,[Document](https://dx.doi.org/10.1037/rev0000186)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p4.1)\.
- Z\. Dörnyei and M\. L\. Scott \(1997\)Communication Strategies in a Second Language: Definitions and Taxonomies\.Language Learning47\(1\),pp\. 173–210\.External Links:ISSN 1467\-9922,[Document](https://dx.doi.org/10.1111/0023-8333.51997005)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p2.1)\.
- E\. Fedorenko, S\. T\. Piantadosi, and E\. A\. F\. Gibson \(2024\)Language is primarily a tool for communication rather than thought\.Nature630\(8017\),pp\. 575–586\.External Links:ISSN 1476\-4687,[Document](https://dx.doi.org/10.1038/s41586-024-07522-w)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p1.1)\.
- E\. Fedorenko, R\. Ryskin, and E\. Gibson \(2022\)Agrammatic output in non\-fluent, including Broca’s, aphasia as a rational behavior\.Aphasiology0\(0\),pp\. 1–20\.External Links:ISSN 0268\-7038,[Document](https://dx.doi.org/10.1080/02687038.2022.2143233)Cited by:[Broader implications](https://arxiv.org/html/2605.15365#Sx4.SSx3.p3.1)\.
- F\. Ferreira and B\. Swets \(2002\)How Incremental Is Language Production? Evidence from the Production of Utterances Requiring the Computation of Arithmetic Sums\.Journal of Memory and Language46\(1\),pp\. 57–84\.External Links:ISSN 0749\-596X,[Link](https://www.sciencedirect.com/science/article/pii/S0749596X01927974),[Document](https://dx.doi.org/10.1006/jmla.2001.2797)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p1.1)\.
- R\. Futrell \(2023\)Information\-theoretic principles in incremental language production\.Proceedings of the National Academy of Sciences120\(39\),pp\. e2220593120\.External Links:[Document](https://dx.doi.org/10.1073/pnas.2220593120)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p1.1)\.
- S\. Gershman and N\. Goodman \(2014\)Amortized Inference in Probabilistic Reasoning\.Proceedings of the Annual Meeting of the Cognitive Science Society36\(36\)\.Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p5.1)\.
- E\. Gibson, R\. Futrell, S\. P\. Piantadosi, I\. Dautriche, K\. Mahowald, L\. Bergen, and R\. Levy \(2019\)How Efficiency Shapes Human Language\.Trends in Cognitive Sciences23\(5\),pp\. 389–407\.External Links:ISSN 1879\-307X,[Document](https://dx.doi.org/10.1016/j.tics.2019.02.003)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p1.1)\.
- T\. L\. Griffiths, F\. Lieder, and N\. D\. Goodman \(2015\)Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic\.Topics in Cognitive Science7\(2\),pp\. 217–229\.External Links:ISSN 1756\-8757, 1756\-8765,[Document](https://dx.doi.org/10.1111/tops.12142)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p1.1)\.
- J\. Grogger, A\. Steinmayr, and J\. Winter \(2020\)The Wage Penalty of Regional Accents\.Working Paper,Working Paper Series,National Bureau of Economic Research\.External Links:26719,[Document](https://dx.doi.org/10.3386/w26719)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p2.1)\.
- J\. Gu, X\. Jiang, Z\. Shi, H\. Tan, X\. Zhai, C\. Xu, W\. Li, Y\. Shen, S\. Ma, H\. Liu, S\. Wang, K\. Zhang, Y\. Wang, W\. Gao, L\. Ni, and J\. Guo \(2025\)A Survey on LLM\-as\-a\-Judge\.arXiv\.External Links:2411\.15594,[Document](https://dx.doi.org/10.48550/arXiv.2411.15594)Cited by:[Automated evaluation of response quality](https://arxiv.org/html/2605.15365#Sx2.SSx5.SSS0.Px1.p1.5)\.
- A\. Hasselgren \(1994\)Lexical teddy bears and advanced learners: a study into the ways Norwegian students cope with English vocabulary\.International Journal of Applied Linguistics4\(2\),pp\. 237–258\.External Links:ISSN 1473\-4192,[Document](https://dx.doi.org/10.1111/j.1473-4192.1994.tb00065.x)Cited by:[Shifts in word frequency](https://arxiv.org/html/2605.15365#Sx2.SSx5.SSS0.Px2.p1.1),[Broader implications](https://arxiv.org/html/2605.15365#Sx4.SSx3.p2.1)\.
- M\. Honnibal, I\. Montani, S\. Van Landeghem, and A\. Boyd \(2020\)spaCy: Industrial\-strength natural language processing in Python\.External Links:[Document](https://dx.doi.org/10.5281/zenodo.1212303)Cited by:[Constrained vocabulary definition](https://arxiv.org/html/2605.15365#Sx2.SSx1.p1.3)\.
- D\. Kahneman and A\. Tversky \(1979\)Prospect Theory: An Analysis of Decision under Risk\.Econometrica47\(2\),pp\. 263–291\.External Links:1914185,ISSN 0012\-9682,[Document](https://dx.doi.org/10.2307/1914185)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p1.1)\.
- D\. Kahneman \(2003\)Maps of Bounded Rationality: Psychology for Behavioral Economics\.American Economic Review93\(5\),pp\. 1449–1475\.External Links:ISSN 0002\-8282,[Document](https://dx.doi.org/10.1257/000282803322655392)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p1.1)\.
- D\. Kelemen, J\. Rottman, and R\. Seston \(2013\)Professional physical scientists display tenacious teleological tendencies: Purpose\-based reasoning as a cognitive default\.Journal of Experimental Psychology: General142\(4\),pp\. 1074–1083\.External Links:ISSN 1939\-2222,[Document](https://dx.doi.org/10.1037/a0030399)Cited by:[Intrinsic differences in question difficulty](https://arxiv.org/html/2605.15365#Sx4.SSx2.p4.1)\.
- E\. Keuleers, M\. Stevens, P\. Mandera, and M\. Brysbaert \(2015\)Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment\.Quarterly Journal of Experimental Psychology68\(8\),pp\. 1665–1692\.External Links:ISSN 1747\-0226,[Document](https://dx.doi.org/10.1080/17470218.2015.1022560)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1)\.
- W\. Klein and C\. Perdue \(1997\)The Basic Variety \(or: Couldn’t natural languages be much simpler?\)\.Second Language Research13\(4\),pp\. 301–347\.External Links:ISSN 0267\-6583,[Document](https://dx.doi.org/10.1191/026765897666879396)Cited by:[Shifts in word frequency](https://arxiv.org/html/2605.15365#Sx2.SSx5.SSS0.Px2.p1.1),[Broader implications](https://arxiv.org/html/2605.15365#Sx4.SSx3.p2.1)\.
- R\. M\. Krauss and S\. Weinheimer \(1964\)Changes in reference phrases as a function of frequency of usage in social interaction: a preliminary study\.Psychonomic Science1\(1\),pp\. 113–114\.External Links:ISSN 2197\-9952,[Document](https://dx.doi.org/10.3758/BF03342817)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p1.1)\.
- B\. Laufer \(1991\)The development of L2 lexis in the expression of the advanced learner\.Modern Language Journal75\(4\),pp\. 440–448\.External Links:ISSN 1540\-4781,[Document](https://dx.doi.org/10.2307/329493)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p2.1)\.
- S\. Leufkens \(2020\)A functionalist typology of redundancy\.Revista da ABRALIN,pp\. 79–103\.External Links:ISSN 0102\-7158,[Document](https://dx.doi.org/10.25189/rabralin.v19i3.1722)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p4.1)\.
- A\. K\. Lew, T\. Zhi\-Xuan, G\. Grand, and V\. K\. Mansinghka \(2023\)Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs\.arXiv\.External Links:2306\.03081,[Document](https://dx.doi.org/10.48550/arXiv.2306.03081)Cited by:[Revision as an index of greediness vs\. planning](https://arxiv.org/html/2605.15365#Sx2.SSx5.SSS0.Px3.p1.1)\.
- F\. Lieder and T\. L\. Griffiths \(2020\)Resource\-rational analysis: Understanding human cognition as the optimal use of limited computational resources\.Behavioral and Brain Sciences43,pp\. e1\.External Links:ISSN 0140\-525X, 1469\-1825,[Document](https://dx.doi.org/10.1017/S0140525X1900061X)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p1.1)\.
- B\. Lipkin, B\. LeBrun, J\. H\. Vigly, J\. Loula, D\. R\. MacIver, L\. Du, J\. Eisner, R\. Cotterell, V\. Mansinghka, T\. J\. O’Donnell, A\. K\. Lew, and T\. Vieira \(2025\)Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling\.arXiv\.External Links:2504\.05410,[Document](https://dx.doi.org/10.48550/arXiv.2504.05410)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p4.1),[Constrained LLM generation withAwrs](https://arxiv.org/html/2605.15365#Sx2.SSx4.p1.1),[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p5.1)\.
- T\. Lombrozo and S\. Carey \(2006\)Functional explanation and the function of explanation\.Cognition99\(2\),pp\. 167–204\.External Links:ISSN 0010\-0277,[Document](https://dx.doi.org/10.1016/j.cognition.2004.12.009)Cited by:[Intrinsic differences in question difficulty](https://arxiv.org/html/2605.15365#Sx4.SSx2.p4.1)\.
- T\. Lombrozo and D\. A\. Wilkenfeld \(2019\)Mechanistic versus functional understanding\.InVarieties of Understanding: New Perspectives from Philosophy, Psychology, and Theology,S\. R\. Grimm \(Ed\.\),pp\. 209–229\(eng\)\.Note:149826 Section: 11Cited by:[Intrinsic differences in question difficulty](https://arxiv.org/html/2605.15365#Sx4.SSx2.p4.1)\.
- J\. Loula, B\. LeBrun, L\. Du, B\. Lipkin, C\. Pasti, G\. Grand, T\. Liu, Y\. Emara, M\. Freedman, J\. Eisner, R\. Cotterell, V\. Mansinghka, A\. K\. Lew, T\. Vieira, and T\. J\. O’Donnell \(2025\)Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo\.arXiv\.External Links:2504\.13139,[Document](https://dx.doi.org/10.48550/arXiv.2504.13139)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p4.1)\.
- K\. Mahowald, E\. Diachek, E\. Gibson, E\. Fedorenko, and R\. Futrell \(2023\)Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages\.Cognition241,pp\. 105543\.External Links:ISSN 0010\-0277,[Document](https://dx.doi.org/10.1016/j.cognition.2023.105543)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p4.1)\.
- D\. Marr \(1982\)Vision: A Computational Investigation into the Human Representation and Processing of Visual Information\.Henry Holt and Co\., Inc\.\.Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1)\.
- A\. M\. McCarthy and F\. C\. Keil \(2023\)A right way to explain? Function, mechanism, and the order of explanations\.Cognition238,pp\. 105494\.External Links:ISSN 0010\-0277,[Document](https://dx.doi.org/10.1016/j.cognition.2023.105494)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1),[Intrinsic differences in question difficulty](https://arxiv.org/html/2605.15365#Sx4.SSx2.p4.1)\.
- M\. J\. Pickering and S\. Garrod \(2013\)An integrated theory of language production and comprehension\.The Behavioral and Brain Sciences36\(4\),pp\. 329–347\.External Links:ISSN 1469\-1825,[Document](https://dx.doi.org/10.1017/S0140525X12001495)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p1.1)\.
- N\. Poulisse \(2011\)A Theoretical Account of Lexical Communication Strategies\.InThe Bilingual Lexicon,pp\. 157–190\.Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p2.1)\.
- N\. Rezaii, K\. Mahowald, R\. Ryskin, B\. Dickerson, and E\. Gibson \(2022\)A syntax–lexicon trade\-off in language production\.Proceedings of the National Academy of Sciences of the United States of America119\(25\)\.External Links:[Document](https://dx.doi.org/10.1073/pnas.2120203119)Cited by:[Broader implications](https://arxiv.org/html/2605.15365#Sx4.SSx3.p3.1)\.
- C\. E\. Shannon \(1948\)A mathematical theory of communication\.pp\. 55\.Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p1.1)\.
- R\. Speer \(2022\)Rspeer/wordfreq: v3\.0\.Note:ZenodoExternal Links:[Document](https://dx.doi.org/10.5281/zenodo.7199437)Cited by:[Constrained vocabulary definition](https://arxiv.org/html/2605.15365#Sx2.SSx1.p1.3)\.
- J\. Sulik, J\. van Paridon, and G\. Lupyan \(2023\)Explanations in the wild\.Cognition237,pp\. 105464\.External Links:ISSN 0010\-0277,[Document](https://dx.doi.org/10.1016/j.cognition.2023.105464)Cited by:[Introduction](https://arxiv.org/html/2605.15365#Sx1.p3.1),[Questions dataset](https://arxiv.org/html/2605.15365#Sx2.SSx2.p1.1)\.
- E\. N\. Tourtouri, F\. Delogu, and M\. W\. Crocker \(2021\)Rational Redundancy in Referring Expressions: Evidence from Event\-related Potentials\.Cognitive Science45\(12\),pp\. e13071\.External Links:ISSN 1551\-6709,[Document](https://dx.doi.org/10.1111/cogs.13071)Cited by:[Resource\-rational inference in language production](https://arxiv.org/html/2605.15365#Sx4.SSx1.p4.1)\.
- H\. R\. Varian \(2010\)Intermediate microeconomics: a modern approach\.8 edition,W\.W\. Norton & Co\.,New York\.Note:varian10 tex\.added\-at: 2015\-02\-10T03:45:35\.000\+0100 tex\.interhash: 1075d5839aaa70f12ee85cfd5239e1f5 tex\.intrahash: 8c9a8e25755e2c46b53db3505767e471 tex\.refid: 317920200 tex\.timestamp: 2015\-02\-10T03:55:59\.000\+0100External Links:ISBN 978\-0\-393\-93424\-3 0\-393\-93424\-1 978\-0\-393\-93533\-2 0\-393\-93533\-7,[Link](http://www.worldcat.org/search?qt=worldcat_org_all&q=0393934241)Cited by:[Shifts in word frequency](https://arxiv.org/html/2605.15365#Sx2.SSx5.SSS0.Px2.p1.1)\.
- L\. Zheng, W\. Chiang, Y\. Sheng, S\. Zhuang, Z\. Wu, Y\. Zhuang, Z\. Lin, Z\. Li, D\. Li, E\. P\. Xing, H\. Zhang, J\. E\. Gonzalez, and I\. Stoica \(2023\)Judging LLM\-as\-a\-Judge with MT\-Bench and Chatbot Arena\.arXiv\.External Links:2306\.05685,[Document](https://dx.doi.org/10.48550/arXiv.2306.05685)Cited by:[Automated evaluation of response quality](https://arxiv.org/html/2605.15365#Sx2.SSx5.SSS0.Px1.p1.5)\.Similar Articles
Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
This paper systematically evaluates the applications of large language models in low-resource language research, analyzing opportunities and challenges across linguistic variation, historical documentation, cultural expressions, and literary analysis. The study emphasizes interdisciplinary collaboration and customized model development to preserve linguistic and cultural heritage while addressing issues of data accessibility, model adaptability, and cultural sensitivity.
Reinforcing Recursive Language Models (18 minute read)
The article explores reinforcement learning fine-tuning of small (4B) recursive language models (RLMs) to perform evidence selection from scientific documents, showing that RL-trained 4B models match Claude Sonnet 4.6 performance at a fraction of the size and cost.
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
This paper investigates how using diverse self-generated data during mid-training improves the effectiveness of Reinforcement Learning in Large Language Models, particularly for reasoning tasks.
Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation
This paper proves impossibility theorems showing that primacy effects, anchoring, and order-dependence are architecturally necessary biases in autoregressive language models due to causal masking constraints. The authors validate these theoretical bounds across 12 frontier LLMs and confirm related predictions through pre-registered human experiments involving working memory loads.
Large Language Models over Networks: Collaborative Intelligence under Resource Constraints
This paper explores collaborative intelligence paradigms where distributed Large Language Models work together across devices and clouds to handle resource constraints. It covers vertical device-cloud collaboration, horizontal multi-agent collaboration, routing policies, and open research challenges in scalable and trustworthy cooperative AI.