xi-an-jiaotong-university

Tag

Cards List
#xi-an-jiaotong-university

Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning

arXiv cs.CL · 2026-05-11 Cached

This paper introduces RIS, a framework for spatial-semantic grounded latent visual reasoning in Multimodal Large Language Models to overcome information bottlenecks. It proposes anchoring latent tokens to spatial and semantic evidence, showing improvements on benchmarks like V* and HRBench.

0 favorites 0 likes
← Back to home

Submit Feedback