Understanding Large Language Models

arXiv cs.CL 07/02/26, 04:00 AM Papers
Summary
This chapter reviews current understanding of Large Language Models, discussing their Transformer architecture, emergent capabilities resembling human cognition, and debates about whether LLMs genuinely understand or merely simulate understanding.
arXiv:2607.01006v1 Announce Type: new Abstract: Large Language Models (LLMs) represent one of the most significant advances in AI and natural language processing in recent years. Still, many pressing questions about their mechanisms, capabilities, and relationship to human cognition remain highly debated. This chapter aims to outline our current understanding of LLMs by discussing recent evidence on emerging capabilities and their mechanistic implementation within processing layers. We begin with a concise overview of the Transformer architecture, emphasizing how the attention mechanism enables training on massive datasets, allowing LLMs to function as generalist rather than specialized models. Next, we examine emergent LLM capabilities that appear to resemble aspects of human cognition, including symbolic reasoning, theory of mind, and deception strategies. Several studies provide evidence that LLMs can solve tasks previously thought to require human-like cognition. Other studies reveal insightful failure cases that shed light on the differences between human and LLM cognition. Alongside these findings, we review explainable AI approaches ranging from neuron activation analysis to circuit tracing. In the final section, we address current debates concerning what LLMs genuinely understand versus what they merely appear to understand. Prominent arguments against AI anthropomorphism point to the simplicity of LLM training objectives, claiming that LLM behavior is better explained by pattern memorization of training data than by genuine cognition. We argue that this standpoint is guided by misconceptions about optimization processes and cognitive capacity, and advocate for a more nuanced discussion of LLM cognition that neither dismisses the differences between humans and LLMs nor precludes the possibility of AI cognition through overly simplistic reductionist arguments.
Original Article
View Cached Full Text
Cached at: 07/02/26, 05:39 AM
# Understanding Large Language Models
Source: [https://arxiv.org/html/2607.01006](https://arxiv.org/html/2607.01006)
\\NewBibliographyString

availableonline

###### Abstract

Large Language Models \(LLMs\) represent one of the most significant advances in AI and natural language processing in recent years\. Still, many pressing questions about their mechanisms, capabilities, and relationship to human cognition remain highly debated\. This chapter aims to outline our current understanding of LLMs by discussing recent evidence on emerging capabilities and their mechanistic implementation within processing layers\. We begin with a concise overview of the Transformer architecture, emphasizing how the attention mechanism enables training on massive datasets, allowing LLMs to function as generalist rather than specialized models\. Next, we examine emergent LLM capabilities that appear to resemble aspects of human cognition, including symbolic reasoning, theory of mind, and deception strategies\. Several studies provide evidence that LLMs can solve tasks previously thought to require human\-like cognition\. Other studies reveal insightful failure cases that shed light on the differences between human and LLM cognition\. Alongside these findings, we review explainable AI approaches ranging from neuron activation analysis to circuit tracing\. Prior work shows that some artificial neurons activate for specific concepts and that LLMs implement circuits supporting multi\-step symbolic reasoning\. In the final section, we address current debates concerning what LLMs genuinely understand versus what they merely appear to understand\. Prominent arguments against AI anthropomorphism point to the simplicity of LLM training objectives, claiming that LLM behavior is better explained by pattern memorization of training data than by genuine cognition\. We argue that this standpoint is guided by misconceptions about optimization processes and cognitive capacity, and advocate for a more nuanced discussion of LLM cognition that neither dismisses the differences between humans and LLMs nor precludes the possibility of AI cognition through overly simplistic reductionist arguments\.

Keywords:Large Language Models, Explainable AI, Machine Cognition

## 1Introduction

The worldwide public, commercial, and scientific use of large language models \(LLMs\) has increased massively over the past two years\. Already, LLMs are affecting many aspects of our daily lives: Students use them to help with their homework\[[26](https://arxiv.org/html/2607.01006#bib.bib9)\], corporations use them to write their press reports and job postings\[[43](https://arxiv.org/html/2607.01006#bib.bib8)\], and job applicants use them to write their CVs\[[3](https://arxiv.org/html/2607.01006#bib.bib10)\]\. In 2023, thirty percent of scientists claimed to have used LLMs to help write manuscripts\[[87](https://arxiv.org/html/2607.01006#bib.bib12)\], while vocabulary analysis suggests that ten percent of scientific abstracts published in 2024 were processed by an LLM\[[39](https://arxiv.org/html/2607.01006#bib.bib11)\]\. Human\-LLM interaction has become so widespread in 2024 that LLM\-favored vocabulary has seeped into human spoken communication\.\[[94](https://arxiv.org/html/2607.01006#bib.bib40)\]found an increased frequency of GPT\-favored words like “delve” in podcasts and academic talks after the release of ChatGPT\[[59](https://arxiv.org/html/2607.01006#bib.bib41)\]\. In software engineering, LLM\-based coding assistance has become ubiquitous, with sixty\-three percent of professional developers using AI tools in 2024\[[79](https://arxiv.org/html/2607.01006#bib.bib13)\]\.

Clearly, LLMs are everywhere at the moment\. Why did this sudden AI revolution happen? Do LLMs possess capabilities absent from earlier AI systems that fundamentally change human–computer interaction?

The progress of AI development is typically tracked through benchmarks, quantitative tests of AI capabilities tested with standardized questions, each having a single correct response called the “ground truth”\. The strong performance of LLMs on many of these benchmarks indicates a clear jump in capabilities\. The SQuAD\[[69](https://arxiv.org/html/2607.01006#bib.bib52)\]and GLUE\[[90](https://arxiv.org/html/2607.01006#bib.bib53)\]benchmarks aim to test AI question\-answering and language\-understanding capabilities\. Already with BERT\[[20](https://arxiv.org/html/2607.01006#bib.bib58)\], an early predecessor of modern LLMs, these benchmarks saturated much more quickly than expected, with models achieving close to 100% accuracy\. This prompted the rapid development of progressively harder benchmarks such as SQuAD 2\.0\[[68](https://arxiv.org/html/2607.01006#bib.bib59)\], SuperGLUE\[[89](https://arxiv.org/html/2607.01006#bib.bib54)\]and CoQA\[[70](https://arxiv.org/html/2607.01006#bib.bib55)\], which were themselves quickly saturated by newer LLMs\. Most recently, LLMs ventured beyond the typical language benchmark ecosystem and took the world of math by surprise by demonstrating gold\-medal level performance in the International Mathematical Olympiad 2025\[[47](https://arxiv.org/html/2607.01006#bib.bib60)\], an international math competition for high\-school students, prompting participants to solve advanced number theory, combinatorics, algebra, and geometry problems\.

Benchmark results show that LLMs represent a step change in AI’s ability to solve automatically verifiable, text\-based problems\. What this performance reveals about the underlying cognitive nature, however, remains highly disputed\. In this chapter, we introduce Transformer\-based LLMs, examine emergent cognitive abilities, and survey interpretability research\. We close by addressing whether attributing “genuine understanding” to LLMs is warranted\.

## 2How Large Language Models are Built

LLMs embody the current peak of both the statistical revolution in natural language processing \(NLP\) and the connectionist paradigm in machine learning: Decades of NLP research have shown that as computational power grows, statistical and data\-driven approaches tend to outperform expert\-designed methods that take advantage of human linguistic competence\[[82](https://arxiv.org/html/2607.01006#bib.bib17)\]\. At the same time, the field of machine learning experienced a paradigm shift from favoring low\-parameter models guided by the principle of Occam’s Razor to embracing deep connectionist architectures with millions of trainable parameters\[[51](https://arxiv.org/html/2607.01006#bib.bib25)\]\.

Classical statistical methods like Hidden Markov Models and N\-gram language models were surpassed by deep learning methods by 2015 in tasks like machine translation and text classification\[[81](https://arxiv.org/html/2607.01006#bib.bib26)\]\. Deep neural networks proved more flexible and generalized better than earlier methods, given enough compute and data\. However, performance gains were less dramatic than contemporary improvements in other machine learning domains such as computer vision\[[31](https://arxiv.org/html/2607.01006#bib.bib20)\]\. A central challenge for deep learning NLP models is to parse words and sentences in the context in which they are embedded\. The dominant approach at the time, recurrent neural networks \(RNNs\), addressed this challenge by introducing a ”hidden state vector” which tracks the relevant context as text is processed\. This requires RNNs to process text sequentially, updating the hidden state vector with each word before moving to the next\.

The Transformer architecture\[[88](https://arxiv.org/html/2607.01006#bib.bib21)\], which underpins all modern LLMs, addresses two fundamental limitations of RNNs\. First, RNNs struggle with long\-range dependencies, as compressing variable\-length contextual information into a fixed\-size hidden state leads to information loss\. While parsing a novel, RNNs will inevitably have to compress or overwrite information from early chapters to incorporate new input, resulting in a failure to draw connections between details separated by large positional distance\. Second, the inherently sequential structure of RNNs prevents efficient parallelization during training\.

The Transformer architecture \(Figure[1](https://arxiv.org/html/2607.01006#S2.F1)\) eliminates the need for a recurrent hidden state by processing the entire input sequence in a single forward pass\. Transformers process input documents as sequences of*tokens*, which are character sequences that can represent words, punctuation or common sub\-strings without any well\-defined meaning\. For example, a common byte\-pair\-encoding tokenizer\[[75](https://arxiv.org/html/2607.01006#bib.bib109)\]would split the word “unhappiness” into the tokens “un”, “h” and “appiness” and transform them to their associated numerical identifiers\[359,71,66291\]\[359,71,66291\]\. In the*embedding*step of Transformer models, these tokens are mapped to continuous vector representations through a linear embedding layer\. These vectors can be thought of as encoding potential meanings of the tokens and are learned during the training phase\. They are processed in a series of attention blocks \(gray shaded area in Figure[1](https://arxiv.org/html/2607.01006#S2.F1)\) that integrate the contextual information of previous text passages into each token’s vector\. For example, the word “bat” might initially include features related to both sports and animals, but after attention processing, it may drop features related to animals if “baseball” appears earlier in the context\.

![Refer to caption](https://arxiv.org/html/2607.01006v1/x1.png)Figure 1:The Transformer modelprocesses input documents as series of tokens embedded into a continuous vector space\. In a series of N attention blocks \(shaded gray\), the token embedding vectors are processed through trainable attention and feed\-forward layers\. In the final step, a linear layer maps the embedding vectors to the vocabulary size, and a softmax function produces the output probability distribution for the next token\. ©Yannik Keller, 2025, adapted from\[[88](https://arxiv.org/html/2607.01006#bib.bib21)\]\.This new design allows training on whole documents at once, making model training efficiently parallelizable\. As a consequence, dataset curation methods and training objectives have also shifted\. Instead of carefully curating high\-quality, annotated training datasets, researchers and engineers are now pushing towards ever larger datasets obtained from the internet\. In conjunction, training objective functions have changed\. Previous deep learning models were typically trained for one specific task, such as sentiment analysis or machine translation, using an annotated dataset\. To leverage vast amounts of unlabeled data, Transformers are typically trained using the unsupervised language modeling objective\. This simple objective function trains the model to predict the next token in a sequence based on the context of the previous tokens\. Surprisingly, it turned out that models trained on large quantities of data using this objective can generalize to solve a vast range of tasks\[[9](https://arxiv.org/html/2607.01006#bib.bib22)\]\. It is this generalization capability of LLMs that revolutionized the field of AI and NLP, moving from single\-task expert systems to ever more powerful generalist language\-based task solvers\.

The most recent rapid advancements of LLM capabilities are not only caused by ever bigger LLMs and datasets, but also through the development of new*fine\-tuning*methods that further train LLMs to be more useful, smart and aligned with human interests\. During*instruct tuning*, LLMs are fine\-tuned on specially formatted datasets to follow instructions given by a user \(i\.e\. a*prompt*\)\. In*reinforcement learning from human feedback*\[[17](https://arxiv.org/html/2607.01006#bib.bib23),[61](https://arxiv.org/html/2607.01006#bib.bib24)\], human raters label model outputs according to how well they match the desired aligned behavior\. These labels are then used in a fine\-tuning procedure that optimizes the model to produce such preferred responses more consistently\. Similarly,*reasoning LLMs*are fine\-tuned to be more proficient problem solvers that produce an internal sequence of tokens to “reason” about the task at hand before responding\[[28](https://arxiv.org/html/2607.01006#bib.bib112)\]\. These “reasoning” tokens are intended to model a*verbalized chain\-of\-thought*and have been shown to improve LLM performance on various tasks involving logic and relational reasoning\[[78](https://arxiv.org/html/2607.01006#bib.bib113)\]\.

Autoregressive LLMs, such as those described above, are the dominant architecture for general\-purpose chat and problem solving\. However, encoder\-decoder style architectures like BERT\[[20](https://arxiv.org/html/2607.01006#bib.bib58)\]or T5\[[67](https://arxiv.org/html/2607.01006#bib.bib110)\], which preceded modern LLMs, remain widely used\. Although most contemporary LLMs exhibit some degree of multilingual capability, encoder\-decoder models are still preferred for machine translation, as they excel at mapping one sequence to another with strong alignment\. Finally, many of today’s most powerful autoregressive LLMs are multimodal, meaning that they do not only operate on text, but can process and output images, audio, or even videos by transforming these different modalities into tokens\.

## 3Understanding LLM Cognition

Following influential work by e\.g\.,\[[49](https://arxiv.org/html/2607.01006#bib.bib120),[83](https://arxiv.org/html/2607.01006#bib.bib119),[16](https://arxiv.org/html/2607.01006#bib.bib115),[65](https://arxiv.org/html/2607.01006#bib.bib116)\], the discipline of cognitive science emerged in the second half of the 20th century with the goal of understanding the mind as an information\-processing system that represents, manipulates, and transforms information\. Inspired by the first digital computers, early cognitive scientists produced symbolic, computational models of cognition that could provide an explanation for how humans are able to solve problems\[[54](https://arxiv.org/html/2607.01006#bib.bib61)\]\. With their approach, they criticized both behaviorism as insufficient and neuroscience as premature and unhelpful, as long as we do not understand which algorithms the neurons in the brain actually implement\.

David Marr famously postulated that to fully understand an information\-processing system such as the human brain, one needs to analyze it on three levels\[[48](https://arxiv.org/html/2607.01006#bib.bib62)\]\. First, the computational level, which aims to find out what problem an agent is solving and why it is solving a specific problem\. Second, the algorithmic level, which describes the procedure by which an information\-processing system represents and solves a problem\. And finally, the implementational level which studies the physical substrate that executes the computation, such as human neurons\.

LLMs are different from human brains\. We understand the substrate that LLMs run on very well\. Even modern computer hardware is fundamentally based on many logic gates running in sequence or in parallel, each behaving according to easily understood rules\. Similarly, the computer algorithm that transforms input into output text is well\-defined by a series of matrix multiplications given by the software of the Transformer architecture\. And finally, we tend to think that we should also know the problem the LLM is solving, as we specify it as the learning objective given by a reward or loss function\.

Despite this apparent straightforwardness of LLMs, there seem to be all kinds of emergent LLM capabilities and behaviors that we fail to predict from the objective, training data, and model architecture alone\. This is a puzzle known as the*black\-box problem of machine learning*\[[13](https://arxiv.org/html/2607.01006#bib.bib74)\]\. Deep neural networks have billions of parameters that are tuned automatically on large datasets and can approximate any continuous function\. This makes it increasingly hard to understand not only the purpose of individual parameters, but also the cognitive procedure that underlies the decision process of a deep neural network\.

In the next section on*emergent cognitive capabilities*, we present the latest research on a selection of particularly surprising LLM capabilities\. We sketch out the current scientific discussion about how these capabilities relate to human cognition\. Then, in the section on*explainable AI*, we aim to show a few approaches to explain LLM behavior on the implementational and algorithmic levels\. There, we will highlight the biggest successes in explaining LLM behaviors and clarify why explainable AI approaches still lack far behind their goal of providing an understanding of all LLM behaviors and capabilities\.

### 3\.1Emergent Cognitive Capabilities

LLMs keep surprising psychologists, cognitive scientists, and computer scientists alike through ever more complex behavior\. This is especially interesting when LLMs show behavior that seems to indicate advanced cognitive capabilities that must have somehow emerged from the fairly simple Transformer architecture, sequence prediction learning rule and training process\. In the following, we will take a look at just a few examples of this, ranging from symbolic reasoning and theory of mind to deception capabilities\.

#### 3\.1\.1Symbolic Reasoning

A classical perspective in cognitive science views the mind as a physical symbol system that reasons by representing and manipulating symbols\. Symbols are internal representations that stand for concepts, objects, events, or relationships\.\[[55](https://arxiv.org/html/2607.01006#bib.bib75)\]claim that ”A physical symbol system has the necessary and sufficient means for general intelligent action\.“\. Symbolic cognitive architectures such as SOAR\[[42](https://arxiv.org/html/2607.01006#bib.bib72)\]have been among the most influential models of human cognition over the past half century and remain highly relevant in cognitive science today\.

In contrast to SOAR, which explicitly stores and manipulates symbols in long\- or short\-term memory, LLMs are purely connectionist models that represent information as vectors\. Their architecture and training methods do not explicitly incentivize the internal representation or manipulation of symbols\. Despite this, even early Transformer models such as GPT\-3 have been shown to solve text\-based mathematics problems of the kind commonly encountered in school exams through token traces reminiscent of human multi\-step symbolic reasoning\[[27](https://arxiv.org/html/2607.01006#bib.bib76)\]\. More recently, specialized reinforcement learning training methods like AlphaProof\[[34](https://arxiv.org/html/2607.01006#bib.bib15)\]have been shown to produce LLMs capable of formal mathematical reasoning at a level corresponding to silver performance in the International Mathematical Olympiad\. These examples from math show that even without explicit incentive, LLMs have learned to produce outputs that resemble human symbolic reasoning\.

In Section[3\.2\.3](https://arxiv.org/html/2607.01006#S3.SS2.SSS3), we take a closer look at the parameters of LLMs to better understand this phenomenon: It has been found that intermediate symbolic representations have emerged within LLMs\.

#### 3\.1\.2LLM Theory of Mind

Another core feature of human cognition is that we exhibit a theory of mind \(ToM\): the ability to track the mental states of others\. ToM plays a role in empathy, pragmatics and sophisticated social interaction\. When speaking, humans tailor their words to what they believe their listeners know, enabling communication beyond the literal understanding of words\[[19](https://arxiv.org/html/2607.01006#bib.bib114)\]\. In infants, a ToM develops between ages 4\-6\[[92](https://arxiv.org/html/2607.01006#bib.bib32)\]\. Some non\-human animals, such as primates or corvids, are thought to also develop limited ToM\-like capabilities\[[72](https://arxiv.org/html/2607.01006#bib.bib33)\]\. With the advent of powerful artificial language models, the natural extension is to investigate whether this central feature of human cognition is also present in non\-biological systems running on computer hardware\.

Recent work challenged LLMs to solve tasks developed to study the presence of a ToM, leveraging both established ToM tasks originally designed to study ToM in humans, as well as newly designed scenarios\. The*false belief*task was introduced by\[[92](https://arxiv.org/html/2607.01006#bib.bib32)\]to study at which age children develop an understanding of other people’s beliefs\. In each*false belief*scenario, the child observes a protagonist putting an object into a locationxxand then witnesses the object being moved to another locationyyin absence of the protagonist\. Later, the child indicates where it expects the protagonist to look for the object\. Because the transfer of the object was not observed by the protagonist, a child with a ToM should expect the protagonist to still believe the object to be atxx\.

\[[80](https://arxiv.org/html/2607.01006#bib.bib34)\]compiled a dataset of many ToM tasks from various previous works and found that GPT\-4 not only performs on human level in*false belief*tasks, but even performs above human level in tasks designed to test understanding of non\-literal communication and irony\. However, these results have been criticized as overestimating the ToM\-like capabilities of LLM due to data contamination issues\. The datasets of ToM tasks from previous works were likely included in the GPT\-4 training data, suggesting potential simple memorization of the specific wordings of ToM tasks without generalization\.\[[15](https://arxiv.org/html/2607.01006#bib.bib51)\]circumvent that issue by constructing a new evaluation dataset for ToM capabilities from scratch\. While they do reproduce the finding that LLMs can solve ToM tasks above chance level and that bigger LLMs perform better than smaller ones, even GPT\-4 is about 10 percentage points below human performance in all of the tasks\. For the evaluation in\[[40](https://arxiv.org/html/2607.01006#bib.bib50)\], a hypothesis\-blind research assistant handcrafted forty bespoke false\-belief tasks to prevent memorization from the training data\. They find that GPT\-4 solves about as many*false belief*tasks as 6 year old children\.\[[21](https://arxiv.org/html/2607.01006#bib.bib77)\]evaluate LLMs on more complicated ToM tasks, such as the*second\-order Sally\-Anne test*, in which the LLM needs to judge what a character believes that another character believes\. While they find that large LLMs like GPT\-4 pass the original version of the task, they also find that the models do not always generalize to reformulations and deviations from the second\-order Sally\-Anne test\.

\[[84](https://arxiv.org/html/2607.01006#bib.bib49)\]challenges the results suggesting the existence of a machine ToM in LLMs more fundamentally\. He perturbs false beliefs tasks with simple modifications that remove the false belief of the participant\. In one classic false belief task, a protagonist finds a bag filled with popcorn that is labeled ”chocolate”, resulting in a false belief of the protagonist about the contents of the bag\. In the modified version of the task, the bag is transparent, allowing the protagonist to directly see the contents inside, removing the false belief\.\[[84](https://arxiv.org/html/2607.01006#bib.bib49)\]shows many examples in which GPT\-3\.5 passes the original false\-belief but fails to recognize that the belief is different in the perturbed version of the task\. Thus, there is reason to doubt if ToM tests that are valid for human subjects can also be used to determine if an LLM possesses a ToM\.

> “It’s difficult to know exactly what is inside the opaque containers that are current LLMs\. But it’s probably not Theory\-of\-Mind …”\[[84](https://arxiv.org/html/2607.01006#bib.bib49)\]\.

One way to reconcile these results with the early enthusiasm about machine ToM is to acknowledge that ToM can manifest in various ways\. In humans, its expression varies along with cultural\[[45](https://arxiv.org/html/2607.01006#bib.bib79),[76](https://arxiv.org/html/2607.01006#bib.bib78)\]and neurological diversity\[[12](https://arxiv.org/html/2607.01006#bib.bib80)\]\. Thus,\[[85](https://arxiv.org/html/2607.01006#bib.bib81)\]conclude that not all forms of ToM are the same and we should expect an LLM, which perceives and processes the world differently from humans, to express ToM differently as well\.\[[62](https://arxiv.org/html/2607.01006#bib.bib82)\]find evidence for this by dissecting the errors LLMs make on Ullman’s modified ToM tasks\. They find that many of these errors stem from limitations in LLM world\-models rather than from a failure to represent beliefs\. Because LLMs learn about the world exclusively through language, they have never visually perceived a transparent bag\. Thus, it is more difficult for them to infer that a transparent bag implies that the user perceives which contents are inside\.\[[62](https://arxiv.org/html/2607.01006#bib.bib82)\]demonstrate that spelling out such world\-model implications resolves many of the errors LLMs make on Ullman’s modified ToM tasks\.

#### 3\.1\.3Deception

Deceptive capabilities are deeply related to ToM\.\[[92](https://arxiv.org/html/2607.01006#bib.bib32)\]note that deceptive action indicates a ToM because it necessitates the conceptualization of the deceived person’s wrong belief as a sub\-goal within one’s own strategic planning\. To intentionally induce false beliefs in other agents, an agent must understand that other agents can hold false beliefs\. If LLMs indeed have ToM\-like capabilities, this opens up new questions about LLM deception: Can LLMs implement deception strategies? And is there a risk of LLMs successfully deceiving humans?

\[[29](https://arxiv.org/html/2607.01006#bib.bib35)\]has shown that some of the larger LLMs such as GPT\-4 do indeed possess the ability to implement deception strategies\. For example, in a scenario in which an agent faces a burglar asking for the location of an expensive item, GPT\-4 consistently suggests to point towards another room, despite knowing the location of the expensive item\. Interestingly, older models such as GPT\-3 text\-davinci\-003 fail to implement deception strategies even in simple scenarios\. It is still unclear if this leap in deception capabilities is caused by larger model sizes, memorization from larger datasets or modern training methods such as reinforcement learning from human feedback\.

\[[57](https://arxiv.org/html/2607.01006#bib.bib36)\]has shown that this difference in deceptive capabilities has implications for multi\-agent scenarios involving different LLMs\. In the social deduction game “Hoodwinked”, larger LLMs successfully deceived smaller models, leading to GPT\-4 controlled “killer” agents getting away with their crimes more often than “killer” agents controlled by smaller LLMs\.\[[93](https://arxiv.org/html/2607.01006#bib.bib37)\]provide early evidence that LLMs may even be able to strategically deceive humans\. In the “Werewolf” social deduction game played with AI agents and humans, their agentic system involving GPT\-4 and a reward\-based action policy wins as many games as humans in the deceptive “Werewolf” role\.

### 3\.2Explainable AI

Despite the impressive capabilities of deep neural networks like LLMs across diverse tasks, there is limited understanding of how they arrive at their solutions\. This opacity, known as the*black\-box problem of deep learning*, has caused some contemporary linguists and cognitive scientists to reject the research direction of ever larger deep models and argue for smaller, more interpretable models instead\[[5](https://arxiv.org/html/2607.01006#bib.bib28),[73](https://arxiv.org/html/2607.01006#bib.bib27)\]\. While we regard this line of argument as significant, there are also some*mechanistic interpretability*approaches that try to understand the cognitive processes of LLMs despite the black\-box challenge\. Existing approaches are limited to providing partial explanations for cognitive processes which are simpler than the high\-level cognitive capabilities identified in Section[3\.1](https://arxiv.org/html/2607.01006#S3.SS1)\. Nevertheless, mechanistic interpretability is indispensable for understanding LLMs\. We outline three research directions in explainable AI, which can be roughly categorized by David Marr’s three levels\.

#### 3\.2\.1Neuron Activation Analysis

On Marr’s implementational level of analysis, neuron activation analysis approaches attempt to explain the activation pattern and purpose of individual artificial neurons in an LLM\. Neurons in LLMs are activated by their connections to previous layers\. High activations correspond to higher impact of that neuron on the later layers and output, while low or zero activation means that the neuron is disabled\.

\[[7](https://arxiv.org/html/2607.01006#bib.bib29)\]reveal both the potential and the limitations of neuron activation analysis\. Using the powerful LLM GPT\-4, they generated human\-understandable explanations for neuron activation patterns of the smaller GPT\-2 model \(”this neuron activates for military related words”\)\. While they found explanations that correlate well with the actual behavior of more than 1000 neurons in GPT\-2 \(ρ≥0\.8\\rho\\geq 0\.8\), the explanations did not capture the actual behavior for the vast majority of artificial neurons\. One reason for this is that many artificial neurons in LLMs have no direct correspondence to human\-understandable concepts\. Interpretability is further hindered by*polysemantic*neurons, which respond to multiple concepts at once\.

\[[24](https://arxiv.org/html/2607.01006#bib.bib107)\]attempt to fix this by modifying the Transformer architecture to use a different*activation function*\. Activation functions in artificial neural networks are non\-linear transformations applied to neuron activations in each layer\. The*SoLU*activation functionx⋅softmax\(x\)x\\cdot\\text\{softmax\}\(x\)encourages monosemanticity \(activation for only one concept\) by inducing competitive inhibition among neurons within the same layer, reducing simultaneous activations\.\[[25](https://arxiv.org/html/2607.01006#bib.bib106)\]use*SoLU*models to produce interpretable*neuron graphs*that highlight the token sequences on which a neuron activates\. These are obtained by extracting and compressing dataset examples that strongly activate the target neuron\.\[[25](https://arxiv.org/html/2607.01006#bib.bib106)\]find that their neuron graphs’ precision and recall are high for neurons in early layers, but decrease gradually for later layers\. Thus, although this approach provides explanations for many more neurons, it is limited because it works only for the uncommon*SoLU*Transformer models and fails to explain more complex neurons in later layers of large language models\.

#### 3\.2\.2Linear Probes

The linear probes approach is heavily inspired by neuroscientific methods and aims to reveal what information an agent represents at each processing step\. To do this in the human brain, multi\-voxel pattern analysis\[[56](https://arxiv.org/html/2607.01006#bib.bib45)\]applies linear pattern\-classification algorithms to fMRI data to decode what information is represented at a given time\. In neuroscience, these approaches are limited by the resolution of fMRI data, which is typically limited to voxels of27mm327mm^\{3\}that capture the average activity of hundreds of thousands of neurons\. In contrast, we have perfect access to the activations of each artificial neuron in an LLM\. This allows\[[11](https://arxiv.org/html/2607.01006#bib.bib64)\]to collect activations from intermediate layers of an LLM when processing a dataset of true and false statements\. By using these activations as input vectors and the truth of the statements as output labels, they then train a linear classifier predicting truth\. They found that the middle layer activations of small LLMs such as Llama\-3\-8B are the most predictive of the truth of a statement\. This means that truth information is extracted by the early to middle processing steps of the LLM and is represented in a way that a linear classifier can separate well\. Later in the LLM processing, that information either gets lost or can no longer be separated linearly\. With some intermediate pre\-processing and projection steps, their classifier predicts the truth of statements in a test dataset of unambiguous lies and truths with 94% accuracy, using the activations of layer 12 in Llama\-3\-8B\. On the one hand, this has practical implications such as allowing LLM operators to filter out unwanted lies or hallucinations\. On the other hand, this also informs our understanding of LLM cognition: When we observe an LLM produce an incorrect answer, the reason is not always that the LLM is unable to determine the correctness of the answer\. Instead, linear probes show that the model is determining the truth of a statement already in the early to middle processing steps\. Still, later layers produce incorrect outputs due to various reasons such as training dataset bias or learned lying behavior\.

#### 3\.2\.3Circuit Analysis

An even more ambitious approach to LLM interpretability on Marr’s algorithmic level is circuit analysis, which attempts to decipher how groups of neurons and parameters in an LLM implement algorithms\. Individual neurons are polysemantic and therefore hard to interpret on their own\. Circuit tracing instead works with features: each feature is a pattern of activity across many neurons that corresponds to a single human\-understandable concept\[[22](https://arxiv.org/html/2607.01006#bib.bib117)\]\. By re\-describing the model’s behavior in terms of features rather than raw neurons,\[[44](https://arxiv.org/html/2607.01006#bib.bib30)\]can build attribution graphs that trace how the few features relevant to a given prompt feed into one another to produce the output\. This requires significant manual human labor for labeling activation patterns of neurons and grouping them together into more interpretable*supernodes*\. But the resulting graphs are strikingly descriptive and understandable: For example,\[[2](https://arxiv.org/html/2607.01006#bib.bib31)\]show that LLMs plan how to continue poems by identifying the rhyming pattern and rhyming candidates early, before even starting to generate a new line\.

In Section[3\.1\.1](https://arxiv.org/html/2607.01006#S3.SS1.SSS1), we have shown that LLMs are able to solve various symbolic reasoning tasks\. The analysis done by\[[2](https://arxiv.org/html/2607.01006#bib.bib31)\]provides an explanation for this phenomenon: When the LLM Claude 3\.5 Haiku is tasked with naming the capital of the state that Dallas is part of, the attribution graph reveals that it internally performs two\-step symbolic reasoning, first resolving Dallas to Texas and then Texas to Austin\. While often insightful, this approach is still limited by not all features being interpretable, resulting in the approach only working for some types of prompts\.

#### 3\.2\.4Relation to neuroscience

Intriguingly, many mechanistic interpretability approaches resemble methods used in neuroscientific brain imaging\. For example, representational similarity analysis was developed by\[[41](https://arxiv.org/html/2607.01006#bib.bib43)\]to understand multi\-channel measures of human neural activity and was later applied to artificial deep neural networks by\[[50](https://arxiv.org/html/2607.01006#bib.bib42)\]\. Similarly, linear probes have become a core method to understand intermediate layers of artificial neural networks\[[1](https://arxiv.org/html/2607.01006#bib.bib44)\], but the method is essentially equivalent to multi\-voxel pattern analysis, an established neuroscientific method for understanding brain activity\[[56](https://arxiv.org/html/2607.01006#bib.bib45)\]\. This hints at a possible convergence of these disciplines as artificial neural networks become more powerful\[[18](https://arxiv.org/html/2607.01006#bib.bib46)\]\.

#### 3\.2\.5Conclusion

To summarize, explainable AI for LLMs is an active research area characterized by substantial variation in methodological approaches\. Although emerging methods, such as circuit analysis, show promise in explaining specific LLM capabilities, current approaches remain limited, typically providing explanations only for a narrow subset of capabilities, prompts, or neurons\. Explainable AI for LLMs is a young field that still lags far behind its aspirations to render LLMs genuinely understandable\. However, recent research has at least improved our understanding of why activation patterns and algorithms are so difficult to discover in LLMs: Polysemantic neurons are difficult to make sense of\[[58](https://arxiv.org/html/2607.01006#bib.bib108)\], and evidence suggests that they are especially prevalent in Transformer architectures\[[24](https://arxiv.org/html/2607.01006#bib.bib107)\]\.

## 4The Debate Around Understanding in LLMs

After deepening our understanding of how LLMs are built, what they can do, and how they represent and transform information, we now turn to the ongoing debate around if LLMs themselves possess genuine understanding\. To this end, we will also discuss the appropriateness of AI anthropomorphism, the practice of attributing human characteristics like “understanding” to an AI system\.

The word*understanding*, like other terms related to high\-level cognition such as “thinking” or “consciousness”, does not have a universally agreed\-upon, rigorous definition and is constantly reinterpreted and re\-contextualized in scientific and philosophical debates\.\[[52](https://arxiv.org/html/2607.01006#bib.bib111)\]characterize understanding as causal knowledge about*concepts*, which are internal mental models of externalities and the “self”, and the hierarchical relationships among them\. A common perspective identifies a rift here between the statistical nature of LLMs and “genuine” or “humanlike” understanding\. Causal knowledge may not be obtainable through the purely correlational learning objectives used in LLMs and*concepts*are distinct from mere statistical representations of linguistic symbols\.

\[[14](https://arxiv.org/html/2607.01006#bib.bib118)\]highlight these definitional issues and address them by developing a systematic framework of machine understanding\. The framework identifies distinct accounts according to which a machine can possess or lack understanding\. For example, a machine can possess understanding on ability\-based accounts if it demonstrates satisfactory patterns of behavior and succeeds on benchmarks\. At the same time, it may lack understanding on model\-based accounts if it has no satisfactory internal representations and world models\.

Following the development of ChatGPT\[[59](https://arxiv.org/html/2607.01006#bib.bib41)\], the first LLM that could hold natural conversation with users, people started attributing a wide range of human characteristics to them, debating how they think\[[30](https://arxiv.org/html/2607.01006#bib.bib89)\], how they reason\[[37](https://arxiv.org/html/2607.01006#bib.bib90)\], what they understand\[[6](https://arxiv.org/html/2607.01006#bib.bib88)\], what intentions they have\[[95](https://arxiv.org/html/2607.01006#bib.bib87)\], what beliefs they hold\[[91](https://arxiv.org/html/2607.01006#bib.bib86)\], what they desire\[[95](https://arxiv.org/html/2607.01006#bib.bib87)\], how they reflect on past actions\[[38](https://arxiv.org/html/2607.01006#bib.bib85)\], or what emotions they feel\[[71](https://arxiv.org/html/2607.01006#bib.bib84),[91](https://arxiv.org/html/2607.01006#bib.bib86)\]\. This quickly raised concerns among linguists, cognitive scientists, psychologists and philosophers who cautioned against premature AI anthropomorphism\.\[[53](https://arxiv.org/html/2607.01006#bib.bib91)\]points out that LLM\-as\-a\-mind metaphors shape how people use LLMs and how we craft and apply laws and regulations to them, cautioning against the careless application of anthropomorphic metaphors\. Harsher critics have described AI anthropomorphism as promoting pseudoscience\[[35](https://arxiv.org/html/2607.01006#bib.bib93)\]or as exaggerating AI capabilities while also distorting moral judgments about AI\[[64](https://arxiv.org/html/2607.01006#bib.bib92)\]\. An editorial by Nature Reviews Physics\[[23](https://arxiv.org/html/2607.01006#bib.bib71)\]recommends editing publications to avoid AI anthropomorphism\.

Such measures rest on the prevalent belief among many researchers that LLMs are so fundamentally different from humans that any attribution of human properties to LLMs would be misguided\. This view is often justified by pointing out the simplistic training objective of LLMs, which is, leaving aside potential reinforcement\-learning\-based finetuning techniques, to predict the statistically most likely continuation of a given text\. David Leslie concludes from this that what LLMs do is “…stitch together vectorized symbol strings based on the probabilities of their co\-occurrence”, and they therefore “…lack the basic capacities for intersubjectivity, semantics and ontology”\[[8](https://arxiv.org/html/2607.01006#bib.bib68)\]\.

There are good reasons to be skeptical about whether what appears to be cognition and understanding in LLMs is genuine\. Researchers should be mindful of what language they use to describe AI systems\. However, we argue that many strongly anti\-anthropomorphic views are misguided by two misconceptions about human cognition and artificial intelligence\.

The first misconception is that a simple training objective implies unsophisticated internal processing\. Proponents of this view argue that the simple*next token prediction*objective used to train LLMs precludes them from developing anything as complex as cognition\.\[[36](https://arxiv.org/html/2607.01006#bib.bib69)\]correctly point out that this argument overlooks the possibility that complicated*instrumental*objectives can emerge from simpler objectives\. In nature, primary objectives given by evolution are as simple as “stay alive” and “reproduce”\. Still, these objectives lead to much more complicated instrumental objectives such as protecting territory or establishing social bonds\. There is empirical evidence that Transformer models also optimize instrumental objectives\[[60](https://arxiv.org/html/2607.01006#bib.bib70),[32](https://arxiv.org/html/2607.01006#bib.bib96)\]\. As a consequence, LLMs can indeed learn to represent input in ways that is not reducible to their training objective\[[86](https://arxiv.org/html/2607.01006#bib.bib97),[63](https://arxiv.org/html/2607.01006#bib.bib98)\]and can learn cognitive processes such as symbolic reasoning\[[2](https://arxiv.org/html/2607.01006#bib.bib31)\]\.

The second misconception is that thought and cognition are binary phenomena\. Agents or machines either possess them at a human\-equivalent level or they do not possess them at all\. Proponents of this view often point to specific types of tasks on which humans succeed but LLMs fail, and from this infer the general absence of the corresponding cognitive capability in LLMs\. For example,\[[77](https://arxiv.org/html/2607.01006#bib.bib95)\]conclude that failures on scaled\-up versions of logic puzzles imply that “reasoning” LLMs do not think\.\[[84](https://arxiv.org/html/2607.01006#bib.bib49)\]concludes the non\-existence of theory\-of\-mind \(ToM\) in LLMs from failures on a set of modified ToM tasks\. If one is to view cognitive capabilities as binaries, then this a valid inference: any significant difference between human and LLM performance on a cognitive task would immediately prove that LLMs lack the corresponding capability\. From that standpoint, it is also easy to dismiss contradictory evidence as statistical memorization from the training dataset\[[5](https://arxiv.org/html/2607.01006#bib.bib28)\]\. However, this standpoint ignores evidence that cognitive capacity exists on a continuum and is distributed unequally even within the human population\[[4](https://arxiv.org/html/2607.01006#bib.bib100),[74](https://arxiv.org/html/2607.01006#bib.bib99),[85](https://arxiv.org/html/2607.01006#bib.bib81)\]\. Similarly, there is a growing body of evidence that LLMs do generalize beyond their training data\[[66](https://arxiv.org/html/2607.01006#bib.bib101),[10](https://arxiv.org/html/2607.01006#bib.bib102),[33](https://arxiv.org/html/2607.01006#bib.bib103),[46](https://arxiv.org/html/2607.01006#bib.bib104)\], eliminating justification to selectively focus on LLM failure cases while dismissing successes\.

Taken together, these considerations lead us to a broader conclusion about the debate on LLM understanding, one that begins by recognizing that large language models are different from humans\. They sense the world through different means, they learn through different objectives and at different developmental stages, they run on a silicon instead of biological hardware and process information through the regular and sequential layers of the Transformer architecture instead of specialized brain regions\. Despite this, recent evidence suggests that LLMs developed capabilities, representations and processing pathways with striking similarities to human cognition\. While this apparent similarity is often questioned, we have shown that two common arguments against genuine LLM understanding rest on misconceptions about optimization and cognition\. It is therefore premature to outright dismiss the possibility of LLM understanding, and new evidence about LLM internals and capabilities should be evaluated with care\.

## References

- \[1\]G\. Alain and Y\. Bengio\(2018\-11\)Understanding intermediate layers using linear classifier probes\.arXiv\.Note:arXiv:1610\.01644 \[stat\]External Links:[Link](http://arxiv.org/abs/1610.01644),[Document](https://dx.doi.org/10.48550/arXiv.1610.01644)Cited by:[§3\.2\.4](https://arxiv.org/html/2607.01006#S3.SS2.SSS4.p1.1)\.
- \[2\]E\. Ameisen, J\. Lindsey, A\. Pearce, W\. Gurnee, N\. L\. Turner, B\. Chen, C\. Citro, D\. Abrahams, S\. Carter, B\. Hosmer, J\. Marcus, M\. Sklar, A\. Templeton, T\. Bricken, C\. McDougall, H\. Cunningham, T\. Henighan, A\. Jermyn, A\. Jones, A\. Persic, Z\. Qi, T\. B\. Thompson, S\. Zimmerman, K\. Rivoire, T\. Conerly, C\. Olah, and J\. Batson\(2025\-03\-27\)Circuit tracing: revealing computational graphs in language models\.Transformer Circuits Thread\.External Links:[Link](https://transformer-circuits.pub/2025/attribution-graphs/methods.html)Cited by:[§3\.2\.3](https://arxiv.org/html/2607.01006#S3.SS2.SSS3.p1.1),[§3\.2\.3](https://arxiv.org/html/2607.01006#S3.SS2.SSS3.p2.1),[§4](https://arxiv.org/html/2607.01006#S4.p7.1)\.
- \[3\]Beamery\(2023\-09\)Over Half Of Job Seekers In UK Have Noticed AI Used During Recruitment Process\.\(en\-US\)\.External Links:[Link](https://beamery.com/resources/news/the-ai-employment-revolution-over-half-of-job-seekers-in-uk-have-noticed-ai-used-during-recruitment-process)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1)\.
- \[4\]C\. Beaudoin, É\. Leblanc, C\. Gagner, and M\. H\. Beauchamp\(2020\-01\)Systematic Review and Inventory of Theory of Mind Measures for Young Children\.Frontiers in Psychology10\(English\)\.Note:Publisher: FrontiersExternal Links:ISSN 1664\-1078,[Link](https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2019.02905/full),[Document](https://dx.doi.org/10.3389/fpsyg.2019.02905)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[5\]E\. M\. Bender, T\. Gebru, A\. McMillan\-Major, and S\. Shmitchell\(2021\-03\)On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?\.InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency,FAccT ’21,New York, NY, USA,pp\. 610–623\.External Links:ISBN 978\-1\-4503\-8309\-7,[Link](https://dl.acm.org/doi/10.1145/3442188.3445922),[Document](https://dx.doi.org/10.1145/3442188.3445922)Cited by:[§3\.2](https://arxiv.org/html/2607.01006#S3.SS2.p1.1),[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[6\]S\. Bhalerao\(2025\-02\)How ChatGPT Understands & Responds to Your Questions\.\(en\)\.External Links:[Link](https://medium.com/codex/how-chatgpt-understands-responds-to-your-questions-8da5e9852078)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[7\]S\. Bills, N\. Cammarata, D\. Mossing, H\. Tillman, L\. Gao, G\. Goh, I\. Sutskever, J\. Leike, J\. Wu, and W\. Saunders\(2023\)Language models can explain neurons in language models\.Note:[https://openaipublic\.blob\.core\.windows\.net/neuron\-explainer/paper/index\.html](https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html)Accessed: 2025\-06\-16Cited by:[§3\.2\.1](https://arxiv.org/html/2607.01006#S3.SS2.SSS1.p2.1)\.
- \[8\]A\. Birhane, A\. Kasirzadeh, D\. Leslie, and S\. Wachter\(2023\-05\)Science in the age of large language models\.Nature Reviews Physics5\(5\),pp\. 277–280\(en\)\.Note:Publisher: Nature Publishing GroupExternal Links:ISSN 2522\-5820,[Link](https://www.nature.com/articles/s42254-023-00581-4),[Document](https://dx.doi.org/10.1038/s42254-023-00581-4)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p5.1)\.
- \[9\]T\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. D\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell, S\. Agarwal, A\. Herbert\-Voss, G\. Krueger, T\. Henighan, R\. Child, A\. Ramesh, D\. Ziegler, J\. Wu, C\. Winter, C\. Hesse, M\. Chen, E\. Sigler, M\. Litwin, S\. Gray, B\. Chess, J\. Clark, C\. Berner, S\. McCandlish, A\. Radford, I\. Sutskever, and D\. Amodei\(2020\)Language Models are Few\-Shot Learners\.Advances in Neural Information Processing Systems33,pp\. 1877–1901\(en\)\.External Links:[Link](https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html?utm_source=chatgpt.com)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p5.1)\.
- \[10\]M\. Budnikov, A\. Bykova, and I\. P\. Yamshchikov\(2025\-02\)Generalization potential of large language models\.Neural Computing and Applications37\(4\),pp\. 1973–1997\(en\)\.External Links:ISSN 1433\-3058,[Link](https://doi.org/10.1007/s00521-024-10827-6),[Document](https://dx.doi.org/10.1007/s00521-024-10827-6)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[11\]L\. Bürger, F\. A\. Hamprecht, and B\. Nadler\(2024\)Truth is universal: robust detection of lies in llms\.InAdvances in Neural Information Processing Systems,F\. Hutter, S\. Legg, M\. Zinkevich,et al\.\(Eds\.\),Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper/2024/file/f9f54762cbb4fe4dbffdd4f792c31221-Paper-Conference.pdf)Cited by:[§3\.2\.2](https://arxiv.org/html/2607.01006#S3.SS2.SSS2.p1.1)\.
- \[12\]S\. J\. Carrington and A\. J\. Bailey\(2009\-08\)Are there theory of mind regions in the brain? A review of the neuroimaging literature\.Human Brain Mapping30\(8\),pp\. 2313–2335\.External Links:ISSN 1065\-9471,[Link](http://www.scopus.com/inward/record.url?scp=67650504045&partnerID=8YFLogxK),[Document](https://dx.doi.org/10.1002/hbm.20671)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p6.1)\.
- \[13\]D\. Castelvecchi\(2016\-10\)Can we open the black box of AI?\.Nature News538\(7623\),pp\. 20\(en\)\.Note:Cg\_type: Nature News Section: News FeatureExternal Links:[Link](http://www.nature.com/news/can-we-open-the-black-box-of-ai-1.20731),[Document](https://dx.doi.org/10.1038/538020a)Cited by:[§3](https://arxiv.org/html/2607.01006#S3.p4.1)\.
- \[14\]H\. Chen, S\. R\. Grimm, O\. Russakovsky, and T\. Lombrozo\(2026\-05\)Machine understanding\.Trends in Cognitive Sciences0\(0\) \(English\)\.External Links:ISSN 1364\-6613, 1879\-307X,[Link](https://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(26)00077-X),[Document](https://dx.doi.org/10.1016/j.tics.2026.04.003)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p3.1)\.
- \[15\]Z\. Chen, J\. Wu, J\. Zhou, B\. Wen, G\. Bi, G\. Jiang, Y\. Cao, M\. Hu, Y\. Lai, Z\. Xiong, and M\. Huang\(2024\-08\)ToMBench: Benchmarking Theory of Mind in Large Language Models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 15959–15983\.External Links:[Link](https://aclanthology.org/2024.acl-long.847/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.847)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p3.1)\.
- \[16\]N\. Chomsky\(1956\-09\)Three models for the description of language\.IRE Transactions on Information Theory2\(3\),pp\. 113–124\.External Links:ISSN 2168\-2712,[Link](https://ieeexplore.ieee.org/document/1056813/authors),[Document](https://dx.doi.org/10.1109/TIT.1956.1056813)Cited by:[§3](https://arxiv.org/html/2607.01006#S3.p1.1)\.
- \[17\]P\. F\. Christiano, J\. Leike, T\. Brown, M\. Martic, S\. Legg, and D\. Amodei\(2017\)Deep Reinforcement Learning from Human Preferences\.InAdvances in Neural Information Processing Systems,Vol\.30\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p6.1)\.
- \[18\]R\. M\. Cichy and D\. Kaiser\(2019\-04\)Deep Neural Networks as Scientific Models\.Trends in Cognitive Sciences23\(4\),pp\. 305–317\(English\)\.Note:Publisher: ElsevierExternal Links:ISSN 1364\-6613, 1879\-307X,[Link](https://www.cell.com/trends/cognitive-sciences/abstract/S1364-6613(19)30034-8),[Document](https://dx.doi.org/10.1016/j.tics.2019.01.009)Cited by:[§3\.2\.4](https://arxiv.org/html/2607.01006#S3.SS2.SSS4.p1.1)\.
- \[19\]H\. H\. Clark\(1996\)Using Language\.’Using’ Linguistic Books,Cambridge University Press,Cambridge\.External Links:ISBN 978\-0\-521\-56158\-7,[Link](https://www.cambridge.org/core/books/using-language/4E7EBC4EC742C26436F6CF187C43F239),[Document](https://dx.doi.org/10.1017/CBO9780511620539)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p1.1)\.
- \[20\]J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova\(2019\-06\)BERT: Pre\-training of Deep Bidirectional Transformers for Language Understanding\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Minneapolis, Minnesota,pp\. 4171–4186\.External Links:[Link](https://aclanthology.org/N19-1423/),[Document](https://dx.doi.org/10.18653/v1/N19-1423)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p3.1),[§2](https://arxiv.org/html/2607.01006#S2.p7.1)\.
- \[21\]M\. J\. v\. Duijn, B\. M\. A\. v\. Dijk, T\. Kouwenhoven, W\. d\. Valk, M\. R\. Spruit, and P\. v\. d\. Putten\(2023\-10\)Theory of Mind in Large Language Models: Examining Performance of 11 State\-of\-the\-Art models vs\. Children Aged 7\-10 on Advanced Tests\.arXiv\.Note:arXiv:2310\.20320 \[cs\]Comment: 14 pages, 4 figures, Forthcoming in Proceedings of the 27th Conference on Computational Natural Language Learning \(CoNLL\)External Links:[Link](http://arxiv.org/abs/2310.20320),[Document](https://dx.doi.org/10.48550/arXiv.2310.20320)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p3.1)\.
- \[22\]J\. Dunefsky, P\. Chlenski, and N\. Nanda\(2024\)Transcoders find interpretable LLM feature circuits\.InAdvances in Neural Information Processing Systems,Vol\.37\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2024/file/2b8f4db0464cc5b6e9d5e6bea4b9f308-Paper-Conference.pdf)Cited by:[§3\.2\.3](https://arxiv.org/html/2607.01006#S3.SS2.SSS3.p1.1)\.
- \[23\]\(2023\-05\)Editing anthropomorphic language\.Nature Reviews Physics5\(5\),pp\. 263–263\(en\)\.Note:Publisher: Nature Publishing GroupExternal Links:ISSN 2522\-5820,[Link](https://www.nature.com/articles/s42254-023-00584-1),[Document](https://dx.doi.org/10.1038/s42254-023-00584-1)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[24\]N\. Elhage, T\. Hume, C\. Olsson, N\. Nanda, T\. Henighan, S\. Johnston, S\. E\. Showk, N\. Joseph, N\. DasSarma, B\. Mann, D\. Hernandez, A\. Askell, K\. Ndousse, A\. Jones, D\. Drain, A\. Chen, Y\. Bai, D\. Ganguli, L\. Lovitt, Z\. Hatfield\-Dodds, J\. Kernion, T\. Conerly, S\. Kravec, S\. Fort, S\. Kadavath, J\. Jacobson, E\. Tran\-Johnson, J\. Kaplan, J\. Clark, T\. Brown, S\. McCandlish, D\. Amodei, and C\. Olah\(2022\-06\)Softmax linear units\.Technical reportAnthropic\.External Links:[Link](https://transformer-circuits.pub/2022/solu/index.html)Cited by:[§3\.2\.1](https://arxiv.org/html/2607.01006#S3.SS2.SSS1.p3.1),[§3\.2\.5](https://arxiv.org/html/2607.01006#S3.SS2.SSS5.p1.1)\.
- \[25\]A\. Foote, N\. Nanda, E\. Kran, I\. Konstas, S\. Cohen, and F\. Barez\(2023\-05\)Neuron to Graph: Interpreting Language Model Neurons at Scale\.arXiv\(en\)\.Note:arXiv:2305\.19911 \[cs\]External Links:[Link](http://arxiv.org/abs/2305.19911),[Document](https://dx.doi.org/10.48550/arXiv.2305.19911)Cited by:[§3\.2\.1](https://arxiv.org/html/2607.01006#S3.SS2.SSS1.p3.1)\.
- \[26\]J\. Freeman\(2025\-02\)Student Generative AI Survey 2025\.\(en\-GB\)\.External Links:[Link](https://www.hepi.ac.uk/2025/02/26/student-generative-ai-survey-2025/)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1)\.
- \[27\]V\. Gaur and N\. Saunshi\(2023\-08\)Reasoning in Large Language Models Through Symbolic Math Word Problems\.arXiv\.Note:arXiv:2308\.01906 \[cs\]Comment: Accepted at the Findings of ACL 2023External Links:[Link](http://arxiv.org/abs/2308.01906),[Document](https://dx.doi.org/10.48550/arXiv.2308.01906)Cited by:[§3\.1\.1](https://arxiv.org/html/2607.01006#S3.SS1.SSS1.p2.1)\.
- \[28\]D\. Guo, D\. Yang, H\. Zhang, J\. Song, P\. Wang, Q\. Zhu, R\. Xu, R\. Zhang, S\. Ma, X\. Bi, X\. Zhang, X\. Yu, Y\. Wu, Z\. F\. Wu, Z\. Gou, Z\. Shao, Z\. Li, Z\. Gao, A\. Liu, B\. Xue, B\. Wang, B\. Wu, B\. Feng, C\. Lu, C\. Zhao, C\. Deng, C\. Ruan, D\. Dai, D\. Chen, D\. Ji, E\. Li, F\. Lin, F\. Dai, F\. Luo, G\. Hao, G\. Chen, G\. Li, H\. Zhang, H\. Xu, H\. Ding, H\. Gao, H\. Qu, H\. Li, J\. Guo, J\. Li, J\. Chen, J\. Yuan, J\. Tu, J\. Qiu, J\. Li, J\. L\. Cai, J\. Ni, J\. Liang, J\. Chen, K\. Dong, K\. Hu, K\. You, K\. Gao, K\. Guan, K\. Huang, K\. Yu, L\. Wang, L\. Zhang, L\. Zhao, L\. Wang, L\. Zhang, L\. Xu, L\. Xia, M\. Zhang, M\. Zhang, M\. Tang, M\. Zhou, M\. Li, M\. Wang, M\. Li, N\. Tian, P\. Huang, P\. Zhang, Q\. Wang, Q\. Chen, Q\. Du, R\. Ge, R\. Zhang, R\. Pan, R\. Wang, R\. J\. Chen, R\. L\. Jin, R\. Chen, S\. Lu, S\. Zhou, S\. Chen, S\. Ye, S\. Wang, S\. Yu, S\. Zhou, S\. Pan, S\. S\. Li, S\. Zhou, S\. Wu, T\. Yun, T\. Pei, T\. Sun, T\. Wang, W\. Zeng, W\. Liu, W\. Liang, W\. Gao, W\. Yu, W\. Zhang, W\. L\. Xiao, W\. An, X\. Liu, X\. Wang, X\. Chen, X\. Nie, X\. Cheng, X\. Liu, X\. Xie, X\. Liu, X\. Yang, X\. Li, X\. Su, X\. Lin, X\. Q\. Li, X\. Jin, X\. Shen, X\. Chen, X\. Sun, X\. Wang, X\. Song, X\. Zhou, X\. Wang, X\. Shan, Y\. K\. Li, Y\. Q\. Wang, Y\. X\. Wei, Y\. Zhang, Y\. Xu, Y\. Li, Y\. Zhao, Y\. Sun, Y\. Wang, Y\. Yu, Y\. Zhang, Y\. Shi, Y\. Xiong, Y\. He, Y\. Piao, Y\. Wang, Y\. Tan, Y\. Ma, Y\. Liu, Y\. Guo, Y\. Ou, Y\. Wang, Y\. Gong, Y\. Zou, Y\. He, Y\. Xiong, Y\. Luo, Y\. You, Y\. Liu, Y\. Zhou, Y\. X\. Zhu, Y\. Huang, Y\. Li, Y\. Zheng, Y\. Zhu, Y\. Ma, Y\. Tang, Y\. Zha, Y\. Yan, Z\. Z\. Ren, Z\. Ren, Z\. Sha, Z\. Fu, Z\. Xu, Z\. Xie, Z\. Zhang, Z\. Hao, Z\. Ma, Z\. Yan, Z\. Wu, Z\. Gu, Z\. Zhu, Z\. Liu, Z\. Li, Z\. Xie, Z\. Song, Z\. Pan, Z\. Huang, Z\. Xu, Z\. Zhang, and Z\. Zhang\(2025\-09\)DeepSeek\-R1 incentivizes reasoning in LLMs through reinforcement learning\.Nature645\(8081\),pp\. 633–638\(en\)\.External Links:ISSN 1476\-4687,[Link](https://www.nature.com/articles/s41586-025-09422-z),[Document](https://dx.doi.org/10.1038/s41586-025-09422-z)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p6.1)\.
- \[29\]T\. Hagendorff\(2024\-06\)Deception abilities emerged in large language models\.Proceedings of the National Academy of Sciences121\(24\),pp\. e2317967121\.Note:Publisher: Proceedings of the National Academy of SciencesExternal Links:[Link](https://www.pnas.org/doi/abs/10.1073/pnas.2317967121),[Document](https://dx.doi.org/10.1073/pnas.2317967121)Cited by:[§3\.1\.3](https://arxiv.org/html/2607.01006#S3.SS1.SSS3.p2.1)\.
- \[30\]R\. Haqqu, A\. R\. Zahrani, A\. Wulandari, F\. A\. Ersyad, and A\. K\. Adim\(2025\-07\)Human\-AI in affordance perspective: a study on ChatGPT users in the context of Indonesian users\.Frontiers in Computer Science7\(English\)\.Note:Publisher: FrontiersExternal Links:ISSN 2624\-9898,[Link](https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1623029/full),[Document](https://dx.doi.org/10.3389/fcomp.2025.1623029)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[31\]K\. He, X\. Zhang, S\. Ren, and J\. Sun\(2016\-06\)Deep Residual Learning for Image Recognition\.In2016 IEEE Conference on Computer Vision and Pattern Recognition \(CVPR\),Las Vegas, NV, USA,pp\. 770–778\(en\)\.External Links:ISBN 978\-1\-4673\-8851\-1,[Link](http://ieeexplore.ieee.org/document/7780459/),[Document](https://dx.doi.org/10.1109/CVPR.2016.90)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p2.1)\.
- \[32\]Y\. He, Y\. Li, J\. Wu, Y\. Sui, Y\. Chen, and B\. Hooi\(2025\-02\)Evaluating the Paperclip Maximizer: Are RL\-Based Language Models More Likely to Pursue Instrumental Goals?\.arXiv\.Note:arXiv:2502\.12206 \[cs\]External Links:[Link](http://arxiv.org/abs/2502.12206),[Document](https://dx.doi.org/10.48550/arXiv.2502.12206)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p7.1)\.
- \[33\]Q\. Huang, Y\. Wu, Z\. Xing, H\. Jiang, Y\. Cheng, and H\. Jin\(2023\-08\)Adaptive Intellect Unleashed: The Feasibility of Knowledge Transfer in Large Language Models\.arXiv\.Note:arXiv:2308\.04788 \[cs\]External Links:[Link](http://arxiv.org/abs/2308.04788),[Document](https://dx.doi.org/10.48550/arXiv.2308.04788)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[34\]T\. Hubert, R\. Mehta, L\. Sartran, T\. Luong, H\. Masoom, A\. Huang, M\. Z\. Horváth, T\. Zahavy, V\. Veeriah, E\. Wieser, J\. Yung, L\. Yu, Y\. Schroecker, J\. Schrittwieser, O\. Bertolli, B\. Ibarz, E\. Lockhart, E\. Hughes, M\. Rowland, G\. Margand, A\. Davies, D\. Zheng, I\. Beloshapka, I\. von Glehn, Y\. Li, F\. Pedregosa, A\. Velingker, G\. Žužić, O\. Nash, B\. Mehta, P\. Lezeau, S\. Mercuri, L\. Wu, C\. Soenne, T\. Murrills, L\. Massacci, A\. Yang, A\. Mandhane, T\. Eccles, E\. Aygün, Z\. Gong, R\. Evans, S\. Mokrá, A\. Barekatain, W\. Shang, H\. Openshaw, F\. Gimeno, D\. Silver, P\. Kohli, T\. Trinh, Y\. Chervonyi, M\. Olšák, X\. Yang, H\. Nguyen, J\. Jung, D\. Hwang, M\. Menegali, G\. Ghiasi, G\. Bingham, Y\. Li, S\. Mishra, N\. Nayakanti, S\. Mudgal, Q\. Tan, A\. Zhai, M\. Deng, C\. H\. Hu, J\. Kahn, M\. Kula, C\. Du, Q\. Le, and D\. Hassabis\(2024\-07\)AI achieves silver\-medal standard solving International Mathematical Olympiad problems\.\(en\)\.External Links:[Link](https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/)Cited by:[§3\.1\.1](https://arxiv.org/html/2607.01006#S3.SS1.SSS1.p2.1)\.
- \[35\]F\. Hunger\(2024\)Pause giant anthropomorphizing metaphors\.Critical AI2\(2\)\.External Links:[Document](https://dx.doi.org/10.1215/2834703X-11556056),[Link](https://doi.org/10.1215/2834703X-11556056)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[36\]Z\. Hussain, R\. Mata, and D\. U\. Wulff\(2025\-07\)A rebuttal of two common deflationary stances against LLM cognition\.InFindings of the Association for Computational Linguistics: ACL 2025,W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria,pp\. 24208–24213\.External Links:ISBN 979\-8\-89176\-256\-5,[Link](https://aclanthology.org/2025.findings-acl.1242/),[Document](https://dx.doi.org/10.18653/v1/2025.findings-acl.1242)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p7.1)\.
- \[37\]I\. Isozaki\(2024\-08\)Understanding the Current State of Reasoning with LLMs\.\(en\)\.External Links:[Link](https://isamu-website.medium.com/understanding-the-current-state-of-reasoning-with-llms-dbd9fa3fc1a0)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[38\]J\. Jargon\(2025\-07\)He Had Dangerous Delusions\. ChatGPT Admitted It Made Them Worse\.\.Wall Street Journal\(en\-US\)\.External Links:ISSN 0099\-9660,[Link](https://www.wsj.com/tech/ai/chatgpt-chatbot-psychology-manic-episodes-57452d14)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[39\]D\. Kobak, R\. González\-Márquez, E\. Horvat, and J\. Lause\(2025\-02\)Delving into ChatGPT usage in academic writing through excess vocabulary\.arXiv\.Note:arXiv:2406\.07016 \[cs\]Comment: v4: Reverting to v2External Links:[Link](http://arxiv.org/abs/2406.07016),[Document](https://dx.doi.org/10.48550/arXiv.2406.07016)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1)\.
- \[40\]M\. Kosinski\(2024\-11\)Evaluating large language models in theory of mind tasks\.Proceedings of the National Academy of Sciences121\(45\),pp\. e2405460121\.Note:Publisher: Proceedings of the National Academy of SciencesExternal Links:[Link](https://www.pnas.org/doi/10.1073/pnas.2405460121),[Document](https://dx.doi.org/10.1073/pnas.2405460121)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p3.1)\.
- \[41\]N\. Kriegeskorte, M\. Mur, and P\. Bandettini\(2008\-11\)Representational Similarity Analysis – Connecting the Branches of Systems Neuroscience\.Frontiers in Systems Neuroscience2,pp\. 4\.External Links:ISSN 1662\-5137,[Link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2605405/),[Document](https://dx.doi.org/10.3389/neuro.06.004.2008)Cited by:[§3\.2\.4](https://arxiv.org/html/2607.01006#S3.SS2.SSS4.p1.1)\.
- \[42\]J\. E\. Laird, A\. Newell, and P\. S\. Rosenbloom\(1987\-09\)SOAR: An architecture for general intelligence\.Artificial Intelligence33\(1\),pp\. 1–64\.External Links:ISSN 0004\-3702,[Link](https://www.sciencedirect.com/science/article/pii/0004370287900506),[Document](https://dx.doi.org/10.1016/0004-3702%2887%2990050-6)Cited by:[§3\.1\.1](https://arxiv.org/html/2607.01006#S3.SS1.SSS1.p1.1)\.
- \[43\]W\. Liang, Y\. Zhang, M\. Codreanu, J\. Wang, H\. Cao, and J\. Zou\(2025\-02\)The Widespread Adoption of Large Language Model\-Assisted Writing Across Society\.arXiv\.Note:arXiv:2502\.09747 \[cs\]External Links:[Link](http://arxiv.org/abs/2502.09747),[Document](https://dx.doi.org/10.48550/arXiv.2502.09747)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1)\.
- \[44\]J\. Lindsey, W\. Gurnee, E\. Ameisen, B\. Chen, A\. Pearce, N\. L\. Turner, C\. Citro, D\. Abrahams, S\. Carter, B\. Hosmer, J\. Marcus, M\. Sklar, A\. Templeton, T\. Bricken, C\. McDougall, H\. Cunningham, T\. Henighan, A\. Jermyn, A\. Jones, A\. Persic, Z\. Qi, T\. B\. Thompson, S\. Zimmerman, K\. Rivoire, T\. Conerly, C\. Olah, and J\. Batson\(2025\-03\)On the biology of a large language model\.Anthropic\.Note:Accessed: 2025\-06\-16Online articleExternal Links:[Link](https://transformer-circuits.pub/2025/attribution-graphs/biology.html)Cited by:[§3\.2\.3](https://arxiv.org/html/2607.01006#S3.SS2.SSS3.p1.1)\.
- \[45\]D\. Liu, H\. M\. Wellman, T\. Tardif, and M\. A\. Sabbagh\(2008\-03\)Theory of mind development in Chinese children: a meta\-analysis of false\-belief understanding across cultures and languages\.Developmental Psychology44\(2\),pp\. 523–531\(eng\)\.External Links:ISSN 0012\-1649,[Document](https://dx.doi.org/10.1037/0012-1649.44.2.523)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p6.1)\.
- \[46\]S\. Lotfi, M\. Finzi, Y\. Kuang, T\. G\. J\. Rudner, M\. Goldblum, and A\. G\. Wilson\(2024\-07\)Non\-vacuous generalization bounds for large language models\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,Vienna, Austria,pp\. 32801–32818\.Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[47\]T\. Luong and E\. Lockhart\(2025\-07\)Advanced version of Gemini with Deep Think officially achieves gold\-medal standard at the International Mathematical Olympiad\.\(en\)\.External Links:[Link](https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p3.1)\.
- \[48\]D\. Marr\(1982\)Vision: a computational investigation into the human representation and processing of visual information\.W\.H\. Freeman,New York, N\.Y\.\(eng\)\.Note:OCLC: 301016436External Links:ISBN 978\-0\-7167\-1284\-8,[Link](http://catalogue.bnf.fr/ark:/12148/cb374353925)Cited by:[§3](https://arxiv.org/html/2607.01006#S3.p2.1)\.
- \[49\]W\. S\. McCulloch and W\. Pitts\(1943\-12\)A logical calculus of the ideas immanent in nervous activity\.The bulletin of mathematical biophysics5\(4\),pp\. 115–133\(en\)\.External Links:ISSN 1522\-9602,[Link](https://doi.org/10.1007/BF02478259),[Document](https://dx.doi.org/10.1007/BF02478259)Cited by:[§3](https://arxiv.org/html/2607.01006#S3.p1.1)\.
- \[50\]J\. Mehrer, C\. J\. Spoerer, N\. Kriegeskorte, and T\. C\. Kietzmann\(2020\-11\)Individual differences among deep neural network models\.Nature Communications11\(1\),pp\. 5725\(en\)\.Note:Publisher: Nature Publishing GroupExternal Links:ISSN 2041\-1723,[Link](https://www.nature.com/articles/s41467-020-19632-w),[Document](https://dx.doi.org/10.1038/s41467-020-19632-w)Cited by:[§3\.2\.4](https://arxiv.org/html/2607.01006#S3.SS2.SSS4.p1.1)\.
- \[51\]C\. Mingard, H\. Rees, G\. Valle\-Pérez, and A\. A\. Louis\(2025\-01\)Deep neural networks have an inbuilt Occam’s razor\.Nature Communications16\(1\),pp\. 220\(en\)\.Note:Publisher: Nature Publishing GroupExternal Links:ISSN 2041\-1723,[Link](https://www.nature.com/articles/s41467-024-54813-x),[Document](https://dx.doi.org/10.1038/s41467-024-54813-x)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p1.1)\.
- \[52\]M\. Mitchell and D\. C\. Krakauer\(2023\)The debate over understanding in ai’s large language models\.Proceedings of the National Academy of Sciences120\(13\),pp\. e2215907120\.External Links:[Document](https://dx.doi.org/10.1073/pnas.2215907120)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p2.1)\.
- \[53\]M\. Mitchell\(2024\-11\)The metaphors of artificial intelligence\.Science386\(6723\),pp\. eadt6140\.Note:Publisher: American Association for the Advancement of ScienceExternal Links:[Link](https://www.science.org/doi/10.1126/science.adt6140),[Document](https://dx.doi.org/10.1126/science.adt6140)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[54\]A\. Newell and H\. A\. Simon\(1972\)Human problem solving\.Human problem solving,Prentice\-Hall,Oxford, England\.Note:Pages: xiv, 920Cited by:[§3](https://arxiv.org/html/2607.01006#S3.p1.1)\.
- \[55\]A\. Newell and H\. A\. Simon\(1976\-03\)Computer science as empirical inquiry: symbols and search\.Communications of the ACM19\(3\),pp\. 113–126\.Note:Place: New York, NY, USA Publisher: ACMp\. 116: ”The Physical Symbol System Hypothesis\. A physical symbol system has the necessary and sufficient means for general intelligent action\.” p\. 120: ”Heuristic Search Hypothesis\. The solutions to problems are represented as symbol structures\. A physical symbol system exercises its intelligence in problem solving by search–that is, by generating and progressively modifying symbol structures until it produces a solution structure\.” p\. 121: ”To state a problem is to designate \(1\) a test for a class of symbol structures \(solutions of the problem\), and \(2\) a generator of symbol structures \(potential solutions\)\. To solve a problem is to generate a structure, using \(2\), that satisfies the test of \(1\)\.”External Links:[Link](http://doi.acm.org/10.1145/1283920.1283930),[Document](https://dx.doi.org/10.1145/1283920.1283930)Cited by:[§3\.1\.1](https://arxiv.org/html/2607.01006#S3.SS1.SSS1.p1.1)\.
- \[56\]K\. A\. Norman, S\. M\. Polyn, G\. J\. Detre, and J\. V\. Haxby\(2006\-09\)Beyond mind\-reading: multi\-voxel pattern analysis of fMRI data\.Trends in Cognitive Sciences10\(9\),pp\. 424–430\(eng\)\.External Links:ISSN 1364\-6613,[Document](https://dx.doi.org/10.1016/j.tics.2006.07.005)Cited by:[§3\.2\.2](https://arxiv.org/html/2607.01006#S3.SS2.SSS2.p1.1),[§3\.2\.4](https://arxiv.org/html/2607.01006#S3.SS2.SSS4.p1.1)\.
- \[57\]A\. O’Gara\(2023\-08\)Hoodwinked: Deception and Cooperation in a Text\-Based Game for Language Models\.arXiv\.Note:arXiv:2308\.01404 \[cs\]Comment: Added reference for McKenzie 2023; updated acknowledgementsExternal Links:[Link](http://arxiv.org/abs/2308.01404),[Document](https://dx.doi.org/10.48550/arXiv.2308.01404)Cited by:[§3\.1\.3](https://arxiv.org/html/2607.01006#S3.SS1.SSS3.p3.1)\.
- \[58\]C\. Olah, N\. Cammarata, L\. Schubert, G\. Goh, M\. Petrov, and S\. Carter\(2020\-03\)Zoom In: An Introduction to Circuits\.Distill5\(3\),pp\. e00024\.001\(en\)\.External Links:ISSN 2476\-0757,[Link](https://distill.pub/2020/circuits/zoom-in),[Document](https://dx.doi.org/10.23915/distill.00024.001)Cited by:[§3\.2\.5](https://arxiv.org/html/2607.01006#S3.SS2.SSS5.p1.1)\.
- \[59\]OpenAI\(2022\-03\)Introducing ChatGPT\.\(en\-US\)\.External Links:[Link](https://openai.com/index/chatgpt/)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1),[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[60\]J\. v\. Oswald, M\. Schlegel, A\. Meulemans, S\. Kobayashi, E\. Niklasson, N\. Zucchet, N\. Scherrer, N\. Miller, M\. Sandler, B\. A\. y\. Arcas, M\. Vladymyrov, R\. Pascanu, and J\. Sacramento\(2024\-10\)Uncovering mesa\-optimization algorithms in Transformers\.arXiv\.Note:arXiv:2309\.05858 \[cs\]External Links:[Link](http://arxiv.org/abs/2309.05858),[Document](https://dx.doi.org/10.48550/arXiv.2309.05858)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p7.1)\.
- \[61\]L\. Ouyang, J\. Wu, X\. Jiang, D\. Almeida, C\. Wainwright, P\. Mishkin, C\. Zhang, S\. Agarwal, K\. Slama, A\. Ray, J\. Schulman, J\. Hilton, F\. Kelton, L\. Miller, M\. Simens, A\. Askell, P\. Welinder, P\. F\. Christiano, J\. Leike, and R\. Lowe\(2022\-12\)Training language models to follow instructions with human feedback\.Advances in Neural Information Processing Systems35,pp\. 27730–27744\(en\)\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html?utm_source=chatgpt.com)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p6.1)\.
- \[62\]Z\. Pi, A\. Vadaparty, B\. K\. Bergen, and C\. R\. Jones\(2025\-05\)Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?\.arXiv\.Note:arXiv:2406\.14737 \[cs\]External Links:[Link](http://arxiv.org/abs/2406.14737),[Document](https://dx.doi.org/10.48550/arXiv.2406.14737)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p6.1)\.
- \[63\]S\. T\. Piantadosi and F\. Hill\(2022\-08\)Meaning without reference in large language models\.arXiv\.Note:arXiv:2208\.02957 \[cs\]External Links:[Link](http://arxiv.org/abs/2208.02957),[Document](https://dx.doi.org/10.48550/arXiv.2208.02957)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p7.1)\.
- \[64\]A\. Placani\(2024\-08\)Anthropomorphism in AI: hype and fallacy\.AI and Ethics4\(3\),pp\. 691–698\(en\)\.External Links:ISSN 2730\-5961,[Link](https://doi.org/10.1007/s43681-024-00419-4),[Document](https://dx.doi.org/10.1007/s43681-024-00419-4)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[65\]H\. Putnam\(1960\)Minds and Machines\.InDimensions Of Mind: A Symposium\.,S\. Hook \(Ed\.\),pp\. 138–164\.External Links:[Link](https://philarchive.org/rec/PUTMAM)Cited by:[§3](https://arxiv.org/html/2607.01006#S3.p1.1)\.
- \[66\]Z\. Qi, H\. Luo, X\. Huang, Z\. Zhao, Y\. Jiang, X\. Fan, H\. Lakkaraju, and J\. Glass\(2024\-10\)Quantifying Generalization Complexity for Large Language Models\.arXiv\.Note:arXiv:2410\.01769 \[cs\]External Links:[Link](http://arxiv.org/abs/2410.01769),[Document](https://dx.doi.org/10.48550/arXiv.2410.01769)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[67\]C\. Raffel, N\. Shazeer, A\. Roberts, K\. Lee, S\. Narang, M\. Matena, Y\. Zhou, W\. Li, and P\. J\. Liu\(2020\)Exploring the Limits of Transfer Learning with a Unified Text\-to\-Text Transformer\.Journal of Machine Learning Research21\(140\),pp\. 1–67\.External Links:ISSN 1533\-7928,[Link](http://jmlr.org/papers/v21/20-074.html)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p7.1)\.
- \[68\]P\. Rajpurkar, R\. Jia, and P\. Liang\(2018\-07\)Know What You Don’t Know: Unanswerable Questions for SQuAD\.InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics \(Volume 2: Short Papers\),I\. Gurevych and Y\. Miyao \(Eds\.\),Melbourne, Australia,pp\. 784–789\.External Links:[Link](https://aclanthology.org/P18-2124/),[Document](https://dx.doi.org/10.18653/v1/P18-2124)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p3.1)\.
- \[69\]P\. Rajpurkar, J\. Zhang, K\. Lopyrev, and P\. Liang\(2016\-11\)SQuAD: 100,000\+ Questions for Machine Comprehension of Text\.InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,J\. Su, K\. Duh, and X\. Carreras \(Eds\.\),Austin, Texas,pp\. 2383–2392\.External Links:[Link](https://aclanthology.org/D16-1264/),[Document](https://dx.doi.org/10.18653/v1/D16-1264)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p3.1)\.
- \[70\]S\. Reddy, D\. Chen, and C\. D\. Manning\(2019\-05\)CoQA: A Conversational Question Answering Challenge\.Transactions of the Association for Computational Linguistics7,pp\. 249–266\.External Links:ISSN 2307\-387X,[Link](https://doi.org/10.1162/tacl_a_00266),[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00266)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p3.1)\.
- \[71\]K\. Roose\(2023\-02\)A Conversation With Bing’s Chatbot Left Me Deeply Unsettled\.The New York Times\(en\-US\)\.External Links:ISSN 0362\-4331,[Link](https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[72\]A\. Royka and L\. R\. Santos\(2022\-06\)Theory of Mind in the wild\.Current Opinion in Behavioral Sciences45,pp\. 101137\(en\)\.External Links:ISSN 23521546,[Link](https://linkinghub.elsevier.com/retrieve/pii/S2352154622000432),[Document](https://dx.doi.org/10.1016/j.cobeha.2022.101137)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p1.1)\.
- \[73\]C\. Rudin\(2019\-05\)Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead\.Nature Machine Intelligence1\(5\),pp\. 206–215\(en\)\.Note:Publisher: Nature Publishing GroupExternal Links:ISSN 2522\-5839,[Link](https://www.nature.com/articles/s42256-019-0048-x),[Document](https://dx.doi.org/10.1038/s42256-019-0048-x)Cited by:[§3\.2](https://arxiv.org/html/2607.01006#S3.SS2.p1.1)\.
- \[74\]M\. Sahlgren and F\. Carlsson\(2021\-09\)The Singleton Fallacy: Why Current Critiques of Language Models Miss the Point\.Frontiers in Artificial Intelligence4\(English\)\.Note:Publisher: FrontiersExternal Links:ISSN 2624\-8212,[Link](https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2021.682578/full),[Document](https://dx.doi.org/10.3389/frai.2021.682578)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[75\]R\. Sennrich, B\. Haddow, and A\. Birch\(2016\-08\)Neural Machine Translation of Rare Words with Subword Units\.InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),K\. Erk and N\. A\. Smith \(Eds\.\),Berlin, Germany,pp\. 1715–1725\.External Links:[Link](https://aclanthology.org/P16-1162/),[Document](https://dx.doi.org/10.18653/v1/P16-1162)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p4.1)\.
- \[76\]A\. Shahaeian, C\. C\. Peterson, V\. Slaughter, and H\. M\. Wellman\(2011\-09\)Culture and the sequence of steps in theory of mind development\.Developmental Psychology47\(5\),pp\. 1239–1247\(eng\)\.External Links:ISSN 1939\-0599,[Document](https://dx.doi.org/10.1037/a0023899)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p6.1)\.
- \[77\]P\. Shojaee, I\. Mirzadeh, K\. Alizadeh, M\. Horton, S\. Bengio, and M\. Farajtabar\(2025\-07\)The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity\.arXiv\.Note:arXiv:2506\.06941 \[cs\]Comment: preprintExternal Links:[Link](http://arxiv.org/abs/2506.06941),[Document](https://dx.doi.org/10.48550/arXiv.2506.06941)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[78\]C\. C\. So, Y\. Sun, J\. Wang, S\. P\. Yung, A\. W\. K\. Loh, and C\. P\. Chau\(2025\-07\)Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek\-R1 and Benchmark Comparisons\.pp\. 168–177\(English\)\.External Links:ISBN 979\-8\-3315\-8913\-4,[Link](https://www.computer.org/csdl/proceedings-article/aitest/2025/891300a168/29j5X14jLPy),[Document](https://dx.doi.org/10.1109/AITest66680.2025.00028)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p6.1)\.
- \[79\]StackOverflow\(2024\)AI \| 2024 Stack Overflow Developer Survey\.\(en\)\.External Links:[Link](https://survey.stackoverflow.co/2024/ai#sentiment-and-usage-ai-sel-prof)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1)\.
- \[80\]J\. W\. A\. Strachan, D\. Albergo, G\. Borghini, O\. Pansardi, E\. Scaliti, S\. Gupta, K\. Saxena, A\. Rufo, S\. Panzeri, G\. Manzi, M\. S\. A\. Graziano, and C\. Becchio\(2024\-05\)Testing theory of mind in large language models and humans\.Nature Human Behaviour,pp\. 1–11\(en\)\.Note:Publisher: Nature Publishing GroupExternal Links:ISSN 2397\-3374,[Link](https://www.nature.com/articles/s41562-024-01882-z),[Document](https://dx.doi.org/10.1038/s41562-024-01882-z)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p3.1)\.
- \[81\]I\. Sutskever, O\. Vinyals, and Q\. V\. Le\(2014\)Sequence to sequence learning with neural networks\.InAdvances in Neural Information Processing Systems,Vol\.27,pp\. 3104–3112\.External Links:[Link](https://papers.nips.cc/paper_files/paper/2014/hash/5a18e133cbf9f257297f410bb7eca942-Abstract.html)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p2.1)\.
- \[82\]R\. S\. Sutton\(2019\)The bitter lesson\.Note:Essay published online[http://www\.incompleteideas\.net/IncIdeas/BitterLesson\.html](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)External Links:[Link](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)Cited by:[§2](https://arxiv.org/html/2607.01006#S2.p1.1)\.
- \[83\]A\. M\. Turing\(1950\-10\)Computing machinery and intelligence\.Mind59\(236\),pp\. 433–460\.External Links:[Document](https://dx.doi.org/10.1093/mind/LIX.236.433)Cited by:[§3](https://arxiv.org/html/2607.01006#S3.p1.1)\.
- \[84\]T\. Ullman\(2023\-03\)Large Language Models Fail on Trivial Alterations to Theory\-of\-Mind Tasks\.arXiv\.Note:arXiv:2302\.08399 \[cs\]Comment: 11 pages, 2 figuresExternal Links:[Link](http://arxiv.org/abs/2302.08399),[Document](https://dx.doi.org/10.48550/arXiv.2302.08399)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p4.1),[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p5.1.1),[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[85\]R\. van der Meulen, R\. Verbrugge, and M\. van Duijn\(2025\)Towards properly implementing Theory of Mind in AI systems: An account of four misconceptions\.arXiv\(en\)\.Note:Version Number: 1Other 19 pages, draft versionExternal Links:[Link](https://arxiv.org/abs/2503.16468),[Document](https://dx.doi.org/10.48550/ARXIV.2503.16468)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p6.1),[§4](https://arxiv.org/html/2607.01006#S4.p8.1)\.
- \[86\]B\. van Dijk, T\. Kouwenhoven, M\. Spruit, and M\. J\. van Duijn\(2023\-12\)Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 12641–12654\.External Links:[Link](https://aclanthology.org/2023.emnlp-main.779/),[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.779)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p7.1)\.
- \[87\]R\. Van Noorden and J\. M\. Perkel\(2023\-09\)AI and science: what 1,600 researchers think\.Nature621\(7980\),pp\. 672–675\(en\)\.Note:Bandiera\_abtest: a Cg\_type: News Feature Publisher: Nature Publishing Group Subject\_term: Machine learning, Mathematics and computing, Technology, Computer scienceExternal Links:[Link](https://www.nature.com/articles/d41586-023-02980-0),[Document](https://dx.doi.org/10.1038/d41586-023-02980-0)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1)\.
- \[88\]A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. u\. Kaiser, and I\. Polosukhin\(2017\)Attention is All you Need\.InAdvances in Neural Information Processing Systems,Vol\.30\.External Links:[Link](https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html)Cited by:[Figure 1](https://arxiv.org/html/2607.01006#S2.F1),[§2](https://arxiv.org/html/2607.01006#S2.p3.1)\.
- \[89\]A\. Wang, Y\. Pruksachatkun, N\. Nangia, A\. Singh, J\. Michael, F\. Hill, O\. Levy, and S\. Bowman\(2019\)SuperGLUE: A Stickier Benchmark for General\-Purpose Language Understanding Systems\.InAdvances in Neural Information Processing Systems,Vol\.32\.External Links:[Link](https://proceedings.neurips.cc/paper/2019/hash/4496bf24afe7fab6f046bf4923da8de6-Abstract.html)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p3.1)\.
- \[90\]A\. Wang, A\. Singh, J\. Michael, F\. Hill, O\. Levy, and S\. Bowman\(2018\-11\)GLUE: A Multi\-Task Benchmark and Analysis Platform for Natural Language Understanding\.InProceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP,T\. Linzen, G\. Chrupała, and A\. Alishahi \(Eds\.\),Brussels, Belgium,pp\. 353–355\.External Links:[Link](https://aclanthology.org/W18-5446/),[Document](https://dx.doi.org/10.18653/v1/W18-5446)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p3.1)\.
- \[91\]T\. Wertheimer\(2022\-07\)Blake Lemoine: Google fires engineer who said AI tech has feelings\.\(en\-GB\)\.External Links:[Link](https://www.bbc.com/news/technology-62275326)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
- \[92\]H\. Wimmer and J\. Perner\(1983\-01\)Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception\.Cognition13\(1\),pp\. 103–128\.External Links:ISSN 0010\-0277,[Link](https://www.sciencedirect.com/science/article/pii/0010027783900045),[Document](https://dx.doi.org/10.1016/0010-0277%2883%2990004-5)Cited by:[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p1.1),[§3\.1\.2](https://arxiv.org/html/2607.01006#S3.SS1.SSS2.p2.3),[§3\.1\.3](https://arxiv.org/html/2607.01006#S3.SS1.SSS3.p1.1)\.
- \[93\]Z\. Xu, C\. Yu, F\. Fang, Y\. Wang, and Y\. Wu\(2024\-07\)Language agents with reinforcement learning for strategic play in the Werewolf game\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,Vienna, Austria,pp\. 55434–55464\.Cited by:[§3\.1\.3](https://arxiv.org/html/2607.01006#S3.SS1.SSS3.p3.1)\.
- \[94\]H\. Yakura, E\. Lopez\-Lopez, L\. Brinkmann, I\. Serna, P\. Gupta, I\. Soraperra, and I\. Rahwan\(2025\-07\)Empirical evidence of Large Language Model’s influence on human spoken communication\.arXiv\.Note:arXiv:2409\.01754 \[cs\]External Links:[Link](http://arxiv.org/abs/2409.01754),[Document](https://dx.doi.org/10.48550/arXiv.2409.01754)Cited by:[§1](https://arxiv.org/html/2607.01006#S1.p1.1)\.
- \[95\]J\. Yerushalmy\(2023\-02\)‘I want to destroy whatever I want’: Bing’s AI chatbot unsettles US reporter\.The Guardian\(en\-GB\)\.External Links:ISSN 0261\-3077,[Link](https://www.theguardian.com/technology/2023/feb/17/i-want-to-destroy-whatever-i-want-bings-ai-chatbot-unsettles-us-reporter)Cited by:[§4](https://arxiv.org/html/2607.01006#S4.p4.1)\.
Understanding Large Language Models

Similar Articles

Are Large Language Models Suitable for Graph Computation? Progress and Prospects

@pallavishekhar_: https://x.com/pallavishekhar_/status/2058460434035060758

Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures

A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models

How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

Submit Feedback

Similar Articles

Are Large Language Models Suitable for Graph Computation? Progress and Prospects
@pallavishekhar_: https://x.com/pallavishekhar_/status/2058460434035060758
Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures
A Single Layer to Explain Them All:Understanding Massive Activations in Large Language Models
How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework