Tag
Proposes Video2GUI, a framework to automatically extract GUI interaction trajectories from unlabeled instructional videos, building WildGUI dataset with 12M trajectories across 1500+ apps. Pre-training on this data yields 5-20% improvements on GUI grounding and action benchmarks.