Tag
This paper introduces SR-REAL, a unified framework for spatial vision-language models that combines linguistic deduction and 3D geometric reasoning via reinforcement learning, enabling robust multi-step spatial reasoning across diverse tasks.