multimodal-evaluation

Tag

Cards List
#multimodal-evaluation

MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge

Hugging Face Daily Papers · 2026-04-20 Cached

Researchers introduce MM-JudgeBias, a benchmark that exposes systematic compositional biases in multimodal large language models when used as automatic judges, testing 26 SOTA MLLMs across 1,800 samples.

0 favorites 0 likes
← Back to home

Submit Feedback