professional-domains

Tag

Cards List
#professional-domains

SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?

arXiv cs.AI · 2026-05-18 Cached

SaaS-Bench is a new benchmark built on 23 deployable SaaS systems across six professional domains, containing 106 long-horizon tasks for evaluating computer-using agents. Experiments show that even the strongest models complete fewer than 4% of tasks end-to-end, highlighting significant limitations in current agent capabilities.

0 favorites 0 likes
← Back to home

Submit Feedback