Tag
This paper proposes a Pre-Reasoning Perception Framework (PRPF) for proactive mobile agents, decoupling intervention timing from assistance generation to improve efficiency and reduce false triggers.
Introduces Claw-Anything, a benchmark that evaluates always-on personal AI assistants on comprehensive user activity contexts spanning extended timeframes, multiple services, and diverse device interactions. Experiments show that even GPT-5.5 achieves only 34.5% pass@1, highlighting a significant gap between current agent capabilities and the demands of always-on assistance.
π-Bench is a new benchmark comprising 100 multi-turn tasks with hidden user intents across 5 domain-specific user personas, designed to evaluate proactive assistance in long-horizon workflows for personal assistant agents.