Tag
MCBench is a new benchmark for assessing the safety of omnimodal large language models across vision, audio, and text modalities. It includes 1196 scenarios and finds current models struggle with cross-modal safety reasoning.
BloomBench is a cognitively grounded bilingual (English-Arabic) multimodal benchmark for Vision-Language Models, systematically evaluating six cognitive levels based on Bloom's Taxonomy. Experiments reveal significant cognitive asymmetries and cross-lingual performance gaps in current models.
KMMMU is a native Korean benchmark for evaluating multimodal understanding with 3,466 questions across nine disciplines and visual modality categories, addressing the gap of English-centric benchmarks by testing performance on Korean-specific cultural and institutional contexts.