Tag
This empirical study investigates whether post-training (supervised fine-tuning and reinforcement learning) can improve LLMs' performance on automated ICD coding, introducing a diagnostic curriculum called PHI that extends GRPO to refine missed-code cases. Results show that prompting-only evaluation underestimates LLM potential, with SFT providing the main capability jump and RL further improving performance.