clinical-benchmark

Tag

Cards List
#clinical-benchmark

MEDSYN: Benchmarking Multi-Evidence Synthesis in Complex Clinical Cases for Multimodal Large Language Models

arXiv cs.CL · 2026-04-20 Cached

MEDSYN is a multilingual multimodal benchmark for evaluating MLLMs on complex clinical cases with up to 7 distinct visual evidence types per case. The study reveals that while frontier models match human experts on differential diagnosis generation, all MLLMs show significant gaps in final diagnosis selection due to poor synthesis of heterogeneous clinical evidence.

0 favorites 0 likes
← Back to home

Submit Feedback