Tag
This paper examines the use of reinforcement learning from world feedback for clinical protocol-execution tasks in FHIR environments, identifies structural barriers like high silent-finish ceilings and zero-gradient tasks, and introduces MedAgentBench-v3 with a lower ceiling. It shows that pure RL underperforms rule-based SFT due to these barriers, and proposes a combined SFT+RL approach.