Tag
This paper proposes learning assessment skills for LLMs to automate rubric construction for scoring tasks, achieving performance comparable to expert-written rubrics without requiring human-written examples.
This study analyzes how modifications to evaluation rubrics, such as shifting from holistic to analytic criteria, impact the agreement between human raters and AI autoraters. The findings suggest that providing examples and reducing bias improves agreement, while higher complexity tends to decrease it.
This paper presents a hybrid framework for detecting alarming or distressed student verbal responses by combining a text classifier (content-based) and an audio classifier (prosodic features), aimed at expediting human review in Automated Verbal Response Scoring systems. The approach addresses a safety gap in automated scoring pipelines where at-risk student responses may otherwise go unnoticed.