multimodal-guardrails

Tag

Cards List
#multimodal-guardrails

OSGuard: A Benchmark for Safety in Computer-Use Agents

arXiv cs.AI · yesterday Cached

OSGuard is a dual-granularity benchmark for evaluating safety in computer-use agents under benign user instructions, featuring action-level judgments and risk-augmented execution suites to detect unsafe shortcuts.

0 favorites 0 likes
← Back to home

Submit Feedback