Tag
DepthVLM enhances Vision-Language Models with a lightweight depth head and unified vision-text supervision, achieving dense metric depth estimation and improved 3D spatial reasoning while maintaining multimodal capabilities.