personal-assistants

#personal-assistants

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

Hugging Face Daily Papers ↗ · 5d ago Cached

MyPCBench evaluates computer-use agents as personal assistants in a simulated Linux desktop environment with real-world web applications, revealing that Claude Opus 4.6 achieves the highest task completion rate of 55.4% while struggling with multi-application tasks and long trajectories.

0 favorites 0 likes

#personal-assistants

@ryanzhuuuu: iMessage is one of the most used messaging channels in America. Yet support for it in personal assistants has always be…

X AI KOLs Following ↗ · 2026-06-10 Cached

Ryan Zhu announces a partnership with NousResearch to enable iMessage connectivity on any OS, allowing personal assistants to access iMessage and unlock new experiences.

0 favorites 0 likes

#personal-assistants

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Hugging Face Daily Papers ↗ · 2026-05-25 Cached

Introduces Claw-Anything, a benchmark that evaluates always-on personal AI assistants on comprehensive user activity contexts spanning extended timeframes, multiple services, and diverse device interactions. Experiments show that even GPT-5.5 achieves only 34.5% pass@1, highlighting a significant gap between current agent capabilities and the demands of always-on assistance.

0 favorites 0 likes

personal-assistants

MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

@ryanzhuuuu: iMessage is one of the most used messaging channels in America. Yet support for it in personal assistants has always be…

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Submit Feedback