visual-elements

#visual-elements

From Scenes to Elements: Multi-Granularity Evidence Retrieval for Verifiable Multimodal RAG

arXiv cs.CL ↗ · 3d ago Cached

This paper introduces GranuVistaVQA, a multimodal benchmark with element-level annotations, and GranuRAG, a framework that treats visual elements as first-class retrieval units for verifiable multimodal RAG, achieving up to 29.2% improvement over baselines.

0 favorites 0 likes

visual-elements

From Scenes to Elements: Multi-Granularity Evidence Retrieval for Verifiable Multimodal RAG

Submit Feedback