theory-of-mind

#theory-of-mind

Belief-reality separation lives in routing over a shared value slot in language models

arXiv cs.CL ↗ · 2026-07-15 Cached

This paper investigates how language models separate a character's belief from reality, finding that they use a shared value slot for attributed values and a router at the query position to select the frame (belief or reality) to read out. It identifies two routes for asserted and derived beliefs, and shows that the slot itself carries no belief-reality tag; the separation lies in dissociated routing subspaces.

0 favorites 0 likes

#theory-of-mind

MafiaScope: Non-Invasive, Time-Resolved Belief Probing for LLM Agents in Social Deduction Games

arXiv cs.CL ↗ · 2026-07-14 Cached

MafiaScope is an open testbed that uses the social deduction game Mafia to probe LLM agents' beliefs non-invasively and in real-time, enabling fine-grained analysis of machine Theory of Mind through structured probe questions, interactive visualization, and counterfactual replay.

0 favorites 0 likes

#theory-of-mind

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action

arXiv cs.CL ↗ · 2026-07-01 Cached

This paper introduces Non-Conversational Planning Theory of Mind (NCP-ToM) and a novel evaluation framework, NCP-ExploreToM, to assess whether LLMs can induce specific belief states in other agents through actions rather than conversation. Testing on frontier models and humans across 600 tasks, GPT-5 achieved ~80% success, outperforming humans, though all models struggled more with false belief states.

0 favorites 0 likes

#theory-of-mind

Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

arXiv cs.CL ↗ · 2026-06-30 Cached

This paper investigates the emergence of situation modeling and mentalizing abilities in transformer language models across training stages, finding that false belief task performance depends on model size and training volume, emerges late in pretraining, and shows fragility with non-factive verbs.

0 favorites 0 likes

#theory-of-mind

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

arXiv cs.CL ↗ · 2026-06-29 Cached

Introduces a triadic variant of the Werewolf social-deduction game with a Jester role to evaluate multi-hop theory of mind in LLMs. Experiments show that current models struggle with the inverted incentives, exposing limitations in their reasoning about opponents' utilities.

0 favorites 0 likes

#theory-of-mind

The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanism

arXiv cs.AI ↗ · 2026-06-12 Cached

The paper introduces the Theory of Mind Utility (ToM-U), a formal computational-level specification for inferring others' epistemic states by constructing Local Epistemic World Models (LEWMs). It differs from Bayesian ToM and simulation theory by providing a domain-agnostic mechanism for belief inference without commitment to algorithmic implementation.

0 favorites 0 likes

#theory-of-mind

Mind the Perspective: Let's Reason Recursively for Theory of Mind

arXiv cs.AI ↗ · 2026-06-11 Cached

Introducing RecToM, an inference-time framework that models nested beliefs via recursive perspective construction for Theory of Mind reasoning in LLMs, achieving state-of-the-art performance on multiple benchmarks.

0 favorites 0 likes

#theory-of-mind

Theory of Mind - LLM vs Human

Reddit r/artificial ↗ · 2026-06-08

A reflection on the difference between LLM theory of mind and human theory of mind, arguing that LLMs lack affective empathy due to their reliance on objective data, while humans integrate subjective experiences.

0 favorites 0 likes

#theory-of-mind

MindZero: Learning Online Mental Reasoning With Zero Annotations

arXiv cs.AI ↗ · 2026-06-02 Cached

MindZero introduces a self-supervised reinforcement learning framework that trains multimodal large language models for efficient and robust online mental reasoning without requiring mental state annotations, outperforming model-based methods in accuracy and efficiency.

0 favorites 0 likes

#theory-of-mind

Differentiable Belief-based Opponent Shaping

arXiv cs.AI ↗ · 2026-05-29 Cached

This paper introduces Differentiable Belief-based Opponent Shaping (D-BOS), a first-order method that treats observer beliefs as the shaped state and differentiates through belief update dynamics, allowing optimal strategies to emerge naturally from the environment's reward structure in hidden-role multi-agent settings.

0 favorites 0 likes

#theory-of-mind

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

arXiv cs.AI ↗ · 2026-05-27 Cached

OmniToM introduces a benchmark that evaluates large language models' theory of mind by requiring explicit belief structure extraction and labeling, revealing a bottleneck in tracking actor-specific beliefs despite strong performance on endpoint QA tasks.

0 favorites 0 likes

#theory-of-mind

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

arXiv cs.LG ↗ · 2026-05-26 Cached

Proposes Agent-ToM, a learning-to-monitor framework using Theory-of-Mind reasoning to detect covert malicious behavior in autonomous LLM agents by inferring beliefs and intents, outperforming baseline monitors.

0 favorites 0 likes

#theory-of-mind

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

arXiv cs.AI ↗ · 2026-05-22 Cached

This paper presents OSCToM, an RL-guided method for generating adversarial data to test nested belief conflicts in LLMs, improving Theory of Mind reasoning on benchmarks like FANToM.

0 favorites 0 likes

#theory-of-mind

Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

arXiv cs.AI ↗ · 2026-05-18 Cached

This paper proposes a new interactive evaluation paradigm for Theory of Mind in LLMs, finding that improvements on static benchmarks do not translate to better performance in dynamic human-AI interactions, highlighting the need for interaction-based assessments.

0 favorites 0 likes

#theory-of-mind

Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces the Instruction Inference task to evaluate Theory of Mind capabilities in LLM-based agents during human-agent collaboration with incomplete or ambiguous instructions. The authors present Tomcat, an LLM agent tested on GPT-4o, DeepSeek-R1, and Gemma-3-27B, demonstrating performance comparable to human participants in inferring unspoken intentions.

0 favorites 0 likes

#theory-of-mind

Learning to model other minds

OpenAI Blog ↗ · 2017-09-14 Cached

OpenAI and University of Oxford researchers present LOLA (Learning with Opponent-Learning Awareness), a reinforcement learning method that enables agents to model and account for the learning of other agents, discovering cooperative strategies in multi-agent games like the iterated prisoner's dilemma and coin game.

0 favorites 0 likes

theory-of-mind

Submit Feedback