StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Hugging Face Daily Papers 06/18/26, 12:00 AM Papers

multimodal-llm social-bias visual-cues benchmark attribute-level-bias fairness

Summary

A new benchmark called StylisticBias systematically evaluates attribute-level social bias in multimodal large language models, finding that a small set of visual cues like fashion style drive most biases.

Multimodal large language models (MLLMs) are increasingly deployed in personally and societally consequential settings, yet the visual cues that shape how these models judge people remain poorly understood. Prior work often compares different (groups of) individuals, making it difficult to separate appearance effects from identity differences. We introduce StylisticBias, a controlled benchmark for evaluating attribute-level social bias in MLLMs. We generate 500 photorealistic base faces and create about 50 single-attribute variations per face, producing about 25K images. This design keeps identity fixed and changes one visual attribute at a time. It lets us measure how specific cues shift model judgments. We evaluate six MLLMs across 25 binary social judgment scenarios. We find that age and body type dominate identity-level effects, while fashion style and other visual cues drive the largest attribute-level shifts. We further find that about 15 attributes account for nearly 80\% of the total variation, showing that bias is concentrated in a small set of visual cues. Sensitivity is strongest in judgments that are semantically aligned with appearance, especially socioeconomic and style-related judgments. We release StylisticBias as a benchmark for fine-grained bias evaluation in multimodal models. Code and dataset: https://github.com/timo-cavelius/StylisticBias and https://hf.co/datasets/shaghayegh/stylistic-bias-dataset.

Original Article

View Cached Full Text

Cached at: 06/22/26, 05:32 PM

Paper page - StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Source: https://huggingface.co/papers/2606.20527 Published on Jun 18

Submitted byhttps://huggingface.co/shaghayegh

kollion Jun 22

Abstract

Multimodal large language models exhibit social bias driven by specific visual attributes, with fashion style and socioeconomic cues having the greatest impact on model judgments.

Multimodal large language models(MLLMs) are increasingly deployed in personally and societally consequential settings, yet the visual cues that shape how these models judge people remain poorly understood. Prior work often compares different (groups of) individuals, making it difficult to separate appearance effects from identity differences. We introduce StylisticBias, a controlled benchmark for evaluatingattribute-level social biasin MLLMs. We generate 500 photorealistic base faces and create about 50 single-attribute variations per face, producing about 25K images. This design keeps identity fixed and changes one visual attribute at a time. It lets us measure how specific cues shift model judgments. We evaluate six MLLMs across 25 binary social judgment scenarios. We find that age and body type dominate identity-level effects, while fashion style and other visual cues drive the largest attribute-level shifts. We further find that about 15 attributes account for nearly 80\% of the total variation, showing that bias is concentrated in a small set of visual cues. Sensitivity is strongest in judgments that are semantically aligned with appearance, especially socioeconomic and style-related judgments. We release StylisticBias as a benchmark for fine-grained bias evaluation in multimodal models. Code and dataset: https://github.com/timo-cavelius/StylisticBias and https://hf.co/datasets/shaghayegh/stylistic-bias-dataset.

View arXiv page View PDF Project page GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2606\.20527

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.20527 in a model README.md to link it from this page.

Datasets citing this paper1

#### shaghayegh/stylistic-bias-dataset Preview• Updatedabout 3 hours ago • 229 • 1

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.20527 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Paper page - StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories

A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs

Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation

Submit Feedback

Similar Articles

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories

A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs

Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation