Tag
A discussion on how AI agents should handle user context: upfront disclosure or gradual learning, with various existing approaches like project memory and chat summaries found lacking.
Introduces Claw-Anything, a benchmark that evaluates always-on personal AI assistants on comprehensive user activity contexts spanning extended timeframes, multiple services, and diverse device interactions. Experiments show that even GPT-5.5 achieves only 34.5% pass@1, highlighting a significant gap between current agent capabilities and the demands of always-on assistance.
The article questions whether AI products over-rely on chat history for personalization, noting its noisiness and suggesting that summaries, tags, and preference fields have shortcomings. It seeks alternative sources of truth for context without becoming intrusive.