Tag
This paper reviews the progress in AI for Systems Engineering (AI4SE) and Systems Engineering for AI (SE4AI) over the past decade, identifies five critical research gaps, and provides a human-AI agreement dataset and web explorer for relevance judgments.
This study analyzes how modifications to evaluation rubrics, such as shifting from holistic to analytic criteria, impact the agreement between human raters and AI autoraters. The findings suggest that providing examples and reducing bias improves agreement, while higher complexity tends to decrease it.