MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Hugging Face Daily Papers 06/01/26, 09:50 AM Papers

agents skill-learning web-based-guides closed-loop-learning benchmark vision-language-model

Summary

MMG2Skill converts web-based procedural guides into executable skills for agents through closed-loop learning, improving performance across GUI control, gameplay, and card play tasks with macro-average gains of +12.8 to +25.3 percentage points.

Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by agents. To bridge the gap between human-oriented guides and agent-executable skills, we formalize this problem as guide-to-skill learning: converting in-the-wild guides into executable skills and continuously improving them from trajectories observable to the agent. To evaluate the capability of existing agents on this task, we introduce MMG2Skill-Bench, the first benchmark designed for this problem. We further propose MMG2Skill, a closed-loop framework that compiles guides into editable skills, conditions a fixed vision-language model (VLM) agent on these skills during execution, and revises the skills from trajectory-level root-cause feedback without using benchmark scores. Across GUI control, open-ended gameplay, and strategic card play with six VLM backbones, MMG2Skill consistently outperforms vanilla baseline agents in every model-domain setting, achieving macro-average gains of +12.8 to +25.3 percentage points across backbones. Ablation studies show that directly prompting agents with raw guides can degrade performance, while both structured skill construction and trajectory-driven revision are necessary for the observed improvements. On success-inferable tasks, analyzer-based early stopping further prevents late-stage performance regressions and saves 25%-53% of attempts when the success signal is properly calibrated.

Original Article

View Cached Full Text

Cached at: 06/04/26, 03:41 AM

Paper page - MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Source: https://huggingface.co/papers/2606.01993 Authors:

Abstract

MMG2Skill framework converts web-based procedural guides into executable skills through closed-loop learning, improving agent performance across GUI control, gameplay, and card play tasks.

Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. However, such knowledge is often multimodal, heterogeneous, noisy, and implicitly assumes human executors, making it difficult to use directly as the skills required by agents. To bridge the gap between human-oriented guides and agent-executable skills, we formalize this problem asguide-to-skill learning: converting in-the-wild guides into executable skills and continuously improving them from trajectories observable to the agent. To evaluate the capability of existing agents on this task, we introduce MMG2Skill-Bench, the first benchmark designed for this problem. We further propose MMG2Skill, aclosed-loop frameworkthat compiles guides into editable skills, conditions a fixedvision-language model(VLM) agent on these skills during execution, and revises the skills fromtrajectory-level root-cause feedbackwithout using benchmark scores. Across GUI control, open-ended gameplay, and strategic card play with six VLM backbones, MMG2Skill consistently outperforms vanilla baseline agents in every model-domain setting, achievingmacro-average gainsof +12.8 to +25.3 percentage points across backbones.Ablation studiesshow that directly prompting agents with raw guides can degrade performance, while both structured skill construction and trajectory-driven revision are necessary for the observed improvements. On success-inferable tasks,analyzer-based early stoppingfurther prevents late-stage performance regressions and saves 25%-53% of attempts when the success signal is properly calibrated.

View arXiv page View PDF GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2606\.01993

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.01993 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.01993 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.01993 in a Space README.md to link it from this page.

MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Paper page - MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

Skill-Guided Continuation Distillation for GUI Agents

SkillGen: Verified Inference-Time Agent Skill Synthesis

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2069064122218717387

@dair_ai: // MetaSkill-Evolve // Great paper on self-improving agents. Most self-improving agents rewrite what the agent does and…

Submit Feedback

Similar Articles

RESOURCE2SKILL: Distilling Executable Agent Skills from Human-Created Multimodal Resources

Skill-Guided Continuation Distillation for GUI Agents

SkillGen: Verified Inference-Time Agent Skill Synthesis

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2069064122218717387

@dair_ai: // MetaSkill-Evolve // Great paper on self-improving agents. Most self-improving agents rewrite what the agent does and…