medical-software

Tag

Cards List
#medical-software

MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents

arXiv cs.AI · 20h ago Cached

MedCUA-Bench is a new benchmark for evaluating computer-use agents on clinical software tasks, covering 18 scenarios across 10 medical domains with safety dimensions. Results show that current agents perform poorly, especially on real OpenEMR, highlighting a significant gap in reliability.

0 favorites 0 likes
← Back to home

Submit Feedback