Tag
This paper investigates LLM-based metrics for evaluating clinical significance in radiology report generation. It identifies discrimination bias in existing LLM evaluators and proposes training lightweight interpretable metrics to improve the balance between error detection and tolerance of harmless variations.