Tag
The paper introduces Hard Negative Captions (HNC), a dataset and method for training vision-language models to achieve fine-grained comprehension by addressing weak associations in web-collected image-text pairs.