Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Hugging Face Daily Papers 05/21/26, 12:00 AM Papers

multimodal-language-models personality-reasoning benchmark dataset social-cognition mllm grounded-reasoning

Summary

Researchers introduce the MM-OCEAN dataset and a three-tier evaluation framework for grounded personality reasoning in multimodal LLMs, revealing a 'Prejudice Gap' where models often make correct predictions without proper grounding.

Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we design a three-tier evaluation (rating, reasoning, grounding) plus four sample-level failure-mode metrics: Prejudice Rate (PR), Confabulation Rate (CR), Integration-failure Rate (IR), and Holistic-grounding Rate (HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and the Holistic-Grounding Rate spans only 0-33.5%. These findings expose a disconnect between getting the right score and reasoning for the right reason, charting a roadmap for grounded social cognition in MLLMs.

Original Article

View Cached Full Text

Cached at: 05/22/26, 06:38 AM

Paper page - Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Source: https://huggingface.co/papers/2605.22109 Authors:

Abstract

Researchers introduce a new task and dataset for evaluating personality reasoning in multimodal language models, revealing significant gaps between accurate predictions and grounded reasoning processes.

Multimodal Large Language Models(MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numericalBig Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalizeGrounded Personality Reasoning(GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through achain of rating,reasoning,and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by amulti-agent pipelinewith human verification, with timestampedbehavioral observations,evidence-grounded trait analyses, and seven categories ofcue-grounding MCQs. (iii) Benchmark and analysis: we design athree-tier evaluation(rating,reasoning, grounding) plus four sample-level failure-mode metrics:Prejudice Rate(PR),Confabulation Rate(CR),Integration-failure Rate(IR), andHolistic-grounding Rate(HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and theHolistic-Grounding Ratespans only 0-33.5%. These findings expose a disconnect between getting the right score andreasoningfor the right reason, charting a roadmap for grounded social cognition in MLLMs.

View arXiv page View PDF GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2605\.22109

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.22109 in a model README.md to link it from this page.

Datasets citing this paper1

#### anonymous-mm-ocean/MM-OCEAN Updatedabout 4 hours ago • 338

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.22109 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Paper page - Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

EdgeBench Reveals the Next Scaling Law: On-the-Fly AI Learning Speed Doubles Every 3 Months

Independent benchmark shows big drops on Claude Fable 5 after its relaunch, here’s the actual context

6x P40 running Minimax M2.7_Q3_XL

@VikParuchuri: OCR hallucinations poison downstream workflows. We built research-driven safeguards that reduce hallucinations to near-…

@jun_song: How is this not considered as a consumer scam? This is the field that we need regulation.

Submit Feedback

Similar Articles

EdgeBench Reveals the Next Scaling Law: On-the-Fly AI Learning Speed Doubles Every 3 Months

Independent benchmark shows big drops on Claude Fable 5 after its relaunch, here’s the actual context

6x P40 running Minimax M2.7_Q3_XL

@VikParuchuri: OCR hallucinations poison downstream workflows. We built research-driven safeguards that reduce hallucinations to near-…

@jun_song: How is this not considered as a consumer scam? This is the field that we need regulation.