multi-role

#multi-role

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

arXiv cs.CL ↗ · 2026-05-19 Cached

This paper introduces CHI-Bench, a benchmark for evaluating AI agents on end-to-end automation of complex healthcare workflows that require policy-grounded decisions, multi-role composition, and multilateral interactions. Experimental results show that the best agent achieves only 28% task resolution, highlighting significant gaps in current agent capabilities for policy-dense enterprise domains.

0 favorites 0 likes

multi-role

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

Submit Feedback