Tag
This paper presents SpaceNum, a unified framework to evaluate how vision-language models (VLMs) understand numerical values in spatial contexts, finding that current models largely fail to ground numbers spatially and often perform close to random guessing.