Tag
BloomBench is a cognitively grounded bilingual (English-Arabic) multimodal benchmark for Vision-Language Models, systematically evaluating six cognitive levels based on Bloom's Taxonomy. Experiments reveal significant cognitive asymmetries and cross-lingual performance gaps in current models.
This paper introduces a psychometric framework and the AIQ Benchmark to evaluate the cognitive profiles of generative AI models, revealing uneven evolution with strong verbal skills but stagnant perceptual reasoning.