Tag
This paper argues that agent skills should incorporate visual information, not just text, and proposes a multimodal skill paradigm combining textual logic with visual support. Experiments show visual skills outperform text-only approaches in visual-centric tasks.
This paper introduces CARL, a method for offline hierarchical reinforcement learning that exploits local dynamics regularity to learn reusable skills. The approach clusters state-goal pairs requiring similar action sequences, enabling more effective skill reuse and improved performance on complex humanoid tasks.
Agent-Sin is an AI agent that automates repeated tasks using reusable skills, aimed at boosting productivity.