phone-use

Tag

Cards List
#phone-use

Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents

Hugging Face Daily Papers · 2026-05-08 Cached

The paper introduces PhoneSafety, a benchmark of 700 safety-critical moments across 130+ apps to evaluate phone-use agents. Results show that avoiding harmful outcomes does not necessarily indicate safety, as models may fail to act or make unsafe choices, requiring a distinction between capability and safety signals.

0 favorites 0 likes
← Back to home

Submit Feedback