Tag
OSGuard is a dual-granularity benchmark for evaluating safety in computer-use agents under benign user instructions, featuring action-level judgments and risk-augmented execution suites to detect unsafe shortcuts.