Tag
This study examines how AI raters (LLMs) score clinical AI outputs under different protocols in complex type 2 diabetes pharmacotherapy, finding that rubric-anchored scoring provides greater discriminative power than rubric-free scoring.