mllm-evaluation

#mllm-evaluation

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

arXiv cs.CL ↗ · 2026-04-20 Cached

KMMMU is a native Korean benchmark for evaluating multimodal understanding with 3,466 questions across nine disciplines and visual modality categories, addressing the gap of English-centric benchmarks by testing performance on Korean-specific cultural and institutional contexts.

0 favorites 0 likes

#mllm-evaluation

MEDSYN: Benchmarking Multi-Evidence Synthesis in Complex Clinical Cases for Multimodal Large Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

MEDSYN is a multilingual multimodal benchmark for evaluating MLLMs on complex clinical cases with up to 7 distinct visual evidence types per case. The study reveals that while frontier models match human experts on differential diagnosis generation, all MLLMs show significant gaps in final diagnosis selection due to poor synthesis of heterogeneous clinical evidence.

0 favorites 0 likes

mllm-evaluation

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

MEDSYN: Benchmarking Multi-Evidence Synthesis in Complex Clinical Cases for Multimodal Large Language Models

Submit Feedback