multimodal-benchmark

#multimodal-benchmark

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

arXiv cs.CL ↗ · 2026-06-05 Cached

MCBench is a new benchmark for assessing the safety of omnimodal large language models across vision, audio, and text modalities. It includes 1196 scenarios and finds current models struggle with cross-modal safety reasoning.

0 favorites 0 likes

#multimodal-benchmark

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

BloomBench is a cognitively grounded bilingual (English-Arabic) multimodal benchmark for Vision-Language Models, systematically evaluating six cognitive levels based on Bloom's Taxonomy. Experiments reveal significant cognitive asymmetries and cross-lingual performance gaps in current models.

0 favorites 0 likes

#multimodal-benchmark

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

arXiv cs.CL ↗ · 2026-04-20 Cached

KMMMU is a native Korean benchmark for evaluating multimodal understanding with 3,466 questions across nine disciplines and visual modality categories, addressing the gap of English-centric benchmarks by testing performance on Korean-specific cultural and institutional contexts.

0 favorites 0 likes

multimodal-benchmark

MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models

Almieyar-Oryx-BloomBench: A Bilingual Multimodal Benchmark for Cognitively Informed Evaluation of Vision-Language Models

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

Submit Feedback