Tag
VisualSkill proposes a hierarchical multimodal skill library for computer-use agents that combines text and figures, achieving a 15.3 point absolute lift on CUA benchmarks over text-only baselines by retaining visual information for GUI interaction.