multi-modal-rag

Tag

Cards List
#multi-modal-rag

Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

arXiv cs.CL · 19h ago Cached

This paper proposes a training-free 'identify-before-answer' (IBA) framework for Knowledge-Based Visual Question Answering (KB-VQA) that decouples entity identification from evidence ranking, outperforming fine-tuned multi-modal retrieval-augmented generation baselines while reducing complexity.

0 favorites 0 likes
← Back to home

Submit Feedback