@rohanpaul_ai: New Microsoft + York Univ paper argues that LLMs should not be treated as human-like without clear tests and narrower c…
Summary
A Microsoft and York University paper argues that attributing human-like attributes to LLMs is problematic due to flawed experimental designs, using Age of Empires II as an analogy to highlight measurement issues.
View Cached Full Text
Cached at: 06/20/26, 10:23 PM
New Microsoft + York Univ paper argues that LLMs should not be treated as human-like without clear tests and narrower claims.
Many studies ask whether LLMs have things like understanding, empathy, anxiety, or self-awareness, but they often build those ideas into the test from the start.
The author shows that, in principle, the old strategy game can implement logic gates, train a tiny perceptron, and serve as a substrate for computation.
If the same language model could be rebuilt inside a game, with goats moving around as bits, would we still say it “understands,” “feels anxiety,” or “has empathy” when it produces the same sentence?
The point is not that the game is secretly intelligent, but that the same computation can be represented in a very different form.
If an LLM-like system were rebuilt inside that game, its answers might stay similar, but people would probably find its “feelings” or “understanding” much less convincing.
The authors argue that this shows a big measurement problem: many human-like claims about LLMs may depend on the interface and the observer, not only on the system itself.
The paper is not saying LLMs definitely lack human-like attributes, or that all talk of AI cognition is nonsense.
It is saying that many experiments smuggle the conclusion into the setup: they assume the model has, or cannot have, a human-like property, then interpret behavior through that assumption.
Link – arxiv. org/abs/2605.31514
Title: “If LLMs Have Human-Like Attributes, Then So Does Age of Empires II”
Similar Articles
IF LLMS HAVE HUMAN-LIKE ATTRIBUTES, THEN SO DOES Age of Empires II
This paper argues that attributing human-like attributes to large language models is problematic because similar claims could be made about simpler systems, such as an AI trained on Age of Empires II, and proposes a null assumption of non-uniqueness to avoid circular reasoning.
@MilesCranmer: This is an insane paper and I love it https://arxiv.org/abs/2605.31514
This paper argues that anthropomorphic attributes often ascribed to LLMs are not unique, demonstrating that simpler systems like Age of Empires II can exhibit similar perceived traits, and calls for explicit measurement criteria in AI behavior analysis.
Evaluating LLMs as Human Surrogates in Controlled Experiments
This paper evaluates whether off-the-shelf LLMs can reliably simulate human responses in controlled behavioral experiments by comparing LLM-generated data with human survey responses on accuracy perception. The findings show that while LLMs capture directional effects and aggregate belief-updating patterns, they do not consistently match human-scale effect magnitudes, clarifying when synthetic LLM data can serve as behavioral proxies.
Human Psychometric Questionnaires Mischaracterize LLM Behavior
This paper finds that human psychometric questionnaires fail to reliably predict LLM behavior in real-world interactions, and proposes generation-based profiling as a more accurate alternative.
@rohanpaul_ai: Yann LeCun says LLMs aren’t a bubble in value or investment—they’ll drive many real-world applications and justify curr…
Yann LeCun argues that LLMs are not a bubble in value or investment, as they will drive many real-world applications and justify current infrastructure spending; the actual bubble is in assuming LLMs can achieve human-level thinking.