Tag
This paper proposes a large-scale multi-modal dataset (MMIO) for zero-shot industrial defect detection and introduces the Refined Text-Visual Prompt (RTVP) method, achieving state-of-the-art results on the benchmark.
This paper introduces a zero-shot strategy for chart summarization using Program-of-Thoughts prompting, where lightweight visual language models (VLMs) generate Python programs to compute statistics, improving factual accuracy over existing methods.