Tag
Slock announces its platform for collaborative building between humans and AI agents.
This paper introduces the Instruction Inference task to evaluate Theory of Mind capabilities in LLM-based agents during human-agent collaboration with incomplete or ambiguous instructions. The authors present Tomcat, an LLM agent tested on GPT-4o, DeepSeek-R1, and Gemma-3-27B, demonstrating performance comparable to human participants in inferring unspoken intentions.