educational-assessment

Tag

Cards List
#educational-assessment

LLMs Struggle to Measure What Distinguishes Students of Different Proficiency Levels: A Study of Item Discrimination in Reading Comprehension Assessment

arXiv cs.CL · 5d ago Cached

This paper evaluates 42 large language models on their ability to measure item discrimination in reading comprehension assessments, finding weak alignment with human-calibrated measures and highlighting it as an open challenge for psychometric evaluation.

0 favorites 0 likes
← Back to home

Submit Feedback