SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Hugging Face Daily Papers 06/01/26, 12:00 AM Papers

skill-based-attacks agent-security benchmark ai-safety poisoning-attacks agent-workflow vulnerability

Summary

SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, revealing high vulnerability (up to 86.3% attack success) in current AI agents and introducing automated attack construction via AutoSkillHarm.

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced by skill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark of skill-based attacks across the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates two attack scenarios: Fixed-Payload Poisoning (FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, and Self-Mutating Poisoning (SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on the agent workflow component targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879 attack samples across 71 skills. Experiments show that current agents remain vulnerable with attack success rates up to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.

Original Article

View Cached Full Text

Cached at: 06/10/26, 05:46 PM

Paper page - SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Source: https://huggingface.co/papers/2606.02540 Authors:

Abstract

SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, demonstrating significant vulnerabilities in current agents with attack success rates up to 86.3%.

Agent skills occupy a privileged position in theagent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced byskill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark ofskill-based attacksacross the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates twoattack scenarios:Fixed-Payload Poisoning(FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, andSelf-Mutating Poisoning(SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on theagent workflowcomponent targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879attack samplesacross 71 skills. Experiments show that current agents remain vulnerable withattack success ratesup to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.

View arXiv page View PDF Project page GitHub4 Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.02540 in a model README.md to link it from this page.

Datasets citing this paper1

#### osunlp/SkillHarm Viewer• Updated9 days ago • 879 • 3.72k • 1

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.02540 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Paper page - SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

Skill Inspector

SkillMaster: Toward Autonomous Skill Mastery in LLM Agents

Not All Skills Help: Measuring and Repairing Agent Knowledge

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

Submit Feedback

Similar Articles

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

SkillMaster: Toward Autonomous Skill Mastery in LLM Agents

Not All Skills Help: Measuring and Repairing Agent Knowledge

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents