Tag
This paper introduces 'progress advantage', an implicit advantage function derived from reinforcement learning post-training that enables effective step-level scoring for LLM agents without requiring dedicated reward model training. It outperforms confidence-based baselines and trained reward models across multiple benchmarks and model families.