Tag
This paper formalizes communication policy for LLM agents and proposes Communication Policy Evolution (CPE), a self-evolution framework that refines communication policies through rollout and prompt-level evolving, achieving best task success across multiple settings.
The article distinguishes between reusable knowledge (durable context) and operational memory (task state) as essential components for building proactive AI agents that can follow through on complex tasks.
Ψ-Bench is a benchmark for evaluating LLMs' ability to influence users through persuasive dialogues, incorporating user profiles for personalized persuasion. Experiments show that even state-of-the-art models have room for improvement, and access to client profiles significantly boosts performance.
Asuka Zheng argues that the 'running out of training data' panic is misplaced; the real scarcity is a lack of imagination in collecting diverse, long-horizon data, illustrated by her SRE replacement project and broader research trends.
This paper introduces Context, a new architecture for proactive goal-directed agents that replaces reactive chatbots. It presents formal theorems proving efficiency gains through composable sandboxed programs, declarative wiring, and proactive state machines, with an open-source implementation.
ProAct is a proactive agent architecture that leverages idle-time computation to anticipate user needs, improving task completion efficiency and accuracy. It introduces ProActEval, a benchmark spanning 200 scenarios across 40 domains, and achieves significant gains over reactive baselines: 14.8% reduction in required turns, 11.7% decrease in user effort, and 28.1% cut in hallucination rates.
A new tool called Agency in Browser Use Box enables AI agents to propose goals and tasks, with humans accepting or rejecting them and agents notifying progress.