Tag
This paper evaluates cross-dataset generalization of supervised ML/DL models and prompted LLMs for automatic Bloom's taxonomy classification of assessment questions, finding that LLMs are more robust across diverse educational contexts.
BloomBench is a cognitively grounded bilingual (English-Arabic) multimodal benchmark for Vision-Language Models, systematically evaluating six cognitive levels based on Bloom's Taxonomy. Experiments reveal significant cognitive asymmetries and cross-lingual performance gaps in current models.