We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

Reddit r/artificial 05/07/26, 09:25 PM Papers

Summary

Researchers analyzed 50 LLMs across 45 psychometric questionnaires, identifying a 'Pinocchio Dimension' that measures how models endorse inner experiences rather than reflecting true personality traits.

What is the “personality” of an LLM? What actually differentiates models psychometrically? Since LLMs entered public use, researchers have been giving them psychometric questionnaires, with mixed results. Their answers often do not seem to reflect the same psychological constructs these tests measure in humans. So we asked a slightly different question: What do LLM responses to psychometric questionnaires actually reflect? We analyzed responses to 45 validated psychometric questionnaires completed by 50 different LLMs. The strongest source of variation was whether a model endorsed items about inner experience: emotions, sensations, thoughts, imagery, empathy, and other forms of first-person experience. We call this factor the Pinocchio Dimension. Importantly, the Pinocchio Dimension is not a classical personality trait. It does not tell us whether a model is “extraverted,” “neurotic,” or “agreeable” in the human sense. Rather, it captures the extent to which a model treats the language of inner experience as self-applicable: whether it responds as if it had feelings, mental imagery, and an inner point of view, or instead as a system that reacts behaviorally to inputs. Preprint in the comments.

Original Article

Similar Articles

Human Psychometric Questionnaires Mischaracterize LLM Behavior

Hugging Face Daily Papers

This paper finds that human psychometric questionnaires fail to reliably predict LLM behavior in real-world interactions, and proposes generation-based profiling as a more accurate alternative.

Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?

arXiv cs.CL

This paper investigates whether fine-tuning LLMs on long-form essays with associated Big Five personality profiles stabilizes questionnaire responses and can induce target profiles, finding that while variance reduces, accuracy on the full five-dimensional profile remains near chance.

I made a quiz that tells you which LLM you align with most, based on personality and values research across 15 models [R]

Reddit r/MachineLearning

A quiz that matches users to the LLM that aligns most with their personality and values, based on research across 15 models.

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

arXiv cs.AI

This paper examines when and why self-reported psychometric measures predict the actual behavior of large language models, finding that fine-grained, behavior-specific instruments (Theory of Planned Behavior) achieve human-level coherence within a shared conversation, while broad traits like Big 5 do not.

Evaluating LLMs as Human Surrogates in Controlled Experiments