Tag
This paper audits knowledge-based VQA benchmarks, revealing systematic violations of assumptions that make accuracy a misleading metric. It introduces a repair protocol and multi-entity augmentation to restore answer derivability and question clarity, showing that corrected settings yield markedly different model rankings.