multi-modal-rag

#multi-modal-rag

Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

arXiv cs.CL ↗ · 19h ago Cached

This paper proposes a training-free 'identify-before-answer' (IBA) framework for Knowledge-Based Visual Question Answering (KB-VQA) that decouples entity identification from evidence ranking, outperforming fine-tuned multi-modal retrieval-augmented generation baselines while reducing complexity.

0 favorites 0 likes

multi-modal-rag

Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

Submit Feedback