Tag
This paper proposes set-distance rewards for reinforcement learning in chest X-ray report generation, using embedding-based set-to-set distances between generated and reference reports. Post-training with these rewards via GRPO consistently outperforms supervised fine-tuning and exact-match rewards, and enables efficient test-time scaling.