StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents
Summary
StainFlow introduces an entity-stain-flow process reward model for GUI agents, using global entity stain tracking and local evidence linking to improve credit assignment in reinforcement learning, achieving 3.2% relative improvement on AndroidWorld.
View Cached Full Text
Cached at: 06/08/26, 09:14 AM
# StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents Source: [https://arxiv.org/abs/2606.07027](https://arxiv.org/abs/2606.07027) Authors:[Haojie Hao](https://arxiv.org/search/cs?searchtype=author&query=Hao,+H),[Longkun Hao](https://arxiv.org/search/cs?searchtype=author&query=Hao,+L),[Yihang Lou](https://arxiv.org/search/cs?searchtype=author&query=Lou,+Y),[Yan Bai](https://arxiv.org/search/cs?searchtype=author&query=Bai,+Y),[Zhenyang Li](https://arxiv.org/search/cs?searchtype=author&query=Li,+Z),[Zhichao Yang](https://arxiv.org/search/cs?searchtype=author&query=Yang,+Z),[Dongshuo Huang](https://arxiv.org/search/cs?searchtype=author&query=Huang,+D),[Hongyu Lin](https://arxiv.org/search/cs?searchtype=author&query=Lin,+H),[Lanqing Hong](https://arxiv.org/search/cs?searchtype=author&query=Hong,+L),[Jiakai Wang](https://arxiv.org/search/cs?searchtype=author&query=Wang,+J),[Xianglong Liu](https://arxiv.org/search/cs?searchtype=author&query=Liu,+X) [View PDF](https://arxiv.org/pdf/2606.07027) > Abstract:Reinforcement Learning \(RL\) has become a promising approach for improving GUI Agents in long\-horizon, stochastic digital environments, but trajectory\-level success feedback is too sparse to provide reliable credit assignment for intermediate exploration steps\. To mitigate this issue, recent studies introduce Process Reward Models \(PRMs\), which provide finer\-grained training feedback through global milestone verification or local step\-level evaluation\. However, these methods still suffer from two level\-specific limitations: global milestone decomposition is subjective and singular, making it difficult to accommodate the multiple valid execution paths in real GUI tasks, while fixed local judging windows may miss long\-range key evidence or dilute the decision signal with irrelevant frames\. Inspired by stain\-tracing mechanisms in network flow analysis, we propose StainFlow, an entity\-stain\-flow process reward model for GUI Agents\. To reduce the subjectivity of global partitioning, we introduce the Global Entity Stain Tracking module, which extracts visually verifiable task entities and tracks how their stain concentrations and states evolve along the trajectory, allowing task phases to be objectively separated by changes in the entity evidence flow\. To improve the accuracy of local verification, we introduce the Local Stain Evidence Linking module\. Centered on the triggering entities of each candidate key node, it retrieves relevant steps based on their stain concentrations and state changes, and dynamically constructs high\-density evidence windows for verifying true key nodes\. Extensive experiments on AndroidWorld and OGRBench show that StainFlow relatively improves online RL success by 3\.2% and trajectory completion judgment accuracy by 1\.8%\. ## Submission history From: Haojie Hao \[[view email](https://arxiv.org/show-email/693681c8/2606.07027)\] **\[v1\]**Fri, 5 Jun 2026 08:17:28 UTC \(6,157 KB\)
Similar Articles
TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents
TRACE is a monitoring framework for long-horizon LLM agent trajectories that uses a Triage-Inspect-Judge loop to connect evidence across temporally distant actions, achieving high recall and F1 on evasive sabotage detection tasks.
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL
This paper proposes Discriminator-Guided Reinforcement Learning (DRL) to correct alignment issues in score- and flow-matching models by using a pretrained representation space discriminator as an optimal reward signal, significantly improving visual fidelity and semantic quality without human preferences.
@HuggingPapers: Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance Naver AI eliminates unsta…
Naver AI introduces Stable-GFlowNet, a method to improve LLM red-teaming by eliminating unstable partition function estimation in Generative Flow Networks through contrastive trajectory balance.
MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization
MobileForge presents an annotation-free adaptation system for mobile GUI agents that uses real app interaction and hierarchical feedback-guided policy optimization to improve performance, achieving near state-of-the-art results on AndroidWorld with open data.
Skill-Guided Continuation Distillation for GUI Agents
The paper proposes Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework that uses skill-guided policies to generate supervision for off-trajectory states during closed-loop execution, improving GUI agent success rates on OSWorld-Verified from around 30% to over 50%.