Tag
This work presents a model that learns shaped 'process rewards' for robotic reinforcement learning, which evolves automatically as the policy improves, enhancing performance on benchmarks and in real-world settings.
StainFlow introduces an entity-stain-flow process reward model for GUI agents, using global entity stain tracking and local evidence linking to improve credit assignment in reinforcement learning, achieving 3.2% relative improvement on AndroidWorld.