Tag
This paper proposes a Variance-Aware Reward Framework using GRPO to improve LLM performance on heart-focused medical question answering, achieving significant accuracy and F1 gains on a HealthBench subset.