Tag
This paper introduces EngVQA, a multimodal benchmark for evaluating engineering reasoning in vision-language models, along with an 8-stage automatic evaluation framework that enables fine-grained analysis of reasoning failures. It reveals substantial limitations in current VLMs' engineering reasoning capabilities.