multimodal-skills

Tag

Cards List
#multimodal-skills

VISUALSKILL: Multimodal Skills for Computer-Use Agents

arXiv cs.CL · 2d ago Cached

VisualSkill proposes a hierarchical multimodal skill library for computer-use agents that combines text and figures, achieving a 15.3 point absolute lift on CUA benchmarks over text-only baselines by retaining visual information for GUI interaction.

0 favorites 0 likes
← Back to home

Submit Feedback