Tag
The paper proposes SVoT, a reinforcement learning framework that generates interleaved, verifiable intermediate states and visualizations for multi-hop spatial reasoning in MLLMs, achieving significant accuracy gains on new benchmarks involving multi-object interactions and numerical reasoning.
Introduces AgentRevive, a Markov state-aware framework for resilient multi-agent collaboration that uses soft state transitions (Active, Standby, Terminated) to prevent premature pruning of agents that may recover, reducing token consumption while improving performance on reasoning and domain tasks.
This paper identifies a critical failure mode in LLM agents where they fail to update personalized memories when new evidence conflicts with prior beliefs. It introduces the STALE benchmark and a three-dimensional probing framework, revealing that even the best models achieve only 55.2% accuracy, and proposes CUPMem as a prototype for robust memory revision.