Tag
This research introduces a 3D benchmark to evaluate whether Vision-Language Model (VLM) agents can achieve mirror self-recognition, a proxy for higher-order cognition. The study finds that while stronger VLMs can use reflected evidence for action, weaker models often fail to extract self-relevant information or misattribute reflections, highlighting the distinction between linguistic compliance and grounded self-identification.