Tag
A developer notes that coding agents consistently fail to help his 10-year-old build creative simulators, revealing LLMs' inability to handle out-of-distribution use cases and arguing that claims of imminent AGI are overstated.
This paper introduces ChildAgentEval, a psychometrically grounded benchmark for assessing cognitive age alignment in MLLM-based agents, comparing their reasoning against human developmental stages.