Tag
This paper investigates using large vision-language models for built environment reasoning tasks, such as design suggestions and risk identification, leveraging remote sensing imagery. It evaluates models like InternVL and Qwen, highlighting their potential for supporting smart city decision-making and quantitative reasoning.
Study reveals that answer tokens in thinking LLMs follow a structured self-reading pattern—forward drift plus focus on key anchors—during quantitative reasoning, and proposes a training-free SRQ steering method to exploit this for accuracy gains.