Tag
This paper introduces PRO-CUA, a process-reward optimization framework for training Computer Use Agents (CUAs) using iterative step-level reinforcement learning. The method decouples on-policy environment interaction from policy optimization, enabling dense credit assignment without relying on expert trajectories, and demonstrates effectiveness on live web benchmarks.