SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
Summary
SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, revealing high vulnerability (up to 86.3% attack success) in current AI agents and introducing automated attack construction via AutoSkillHarm.
View Cached Full Text
Cached at: 06/10/26, 05:46 PM
Paper page - SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction
Source: https://huggingface.co/papers/2606.02540 Authors:
,
,
,
,
,
,
,
,
,
Abstract
SkillHarm is a benchmark for evaluating skill-based attacks across the skill-use lifecycle, demonstrating significant vulnerabilities in current agents with attack success rates up to 86.3%.
Agent skills occupy a privileged position in theagent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack surface. Existing studies have revealed unsafe agent behaviors induced byskill-based attacks, but they primarily evaluate poisoned skills within a single task execution and enumerate harms through ad-hoc risk lists. To bridge these gaps, we introduce SkillHarm, a benchmark ofskill-based attacksacross the skill-use lifecycle, paired with a systematic taxonomy of skill-relevant risks. SkillHarm evaluates twoattack scenarios:Fixed-Payload Poisoning(FPP), where a fixed poisoned skill package directly compromises any task session that invokes it, andSelf-Mutating Poisoning(SMP), where an initially benign execution silently mutates persistent skill content, deferring harm until a subsequent reuse. It further defines 12 risk types based on theagent workflowcomponent targeted by the harm: data pipelines, system environments, and agent autonomy. To instantiate these attacks at scale, we build AutoSkillHarm, an automated construction pipeline with coding agents driven by natural-language harnesses. The resulting benchmark contains 879attack samplesacross 71 skills. Experiments show that current agents remain vulnerable withattack success ratesup to 86.3% in FPP and 69.3% in SMP. Our analysis further reveals a latent risk: many apparent attack failures stem from the agent failing to engage with the poisoned file rather than genuine resistance, and current defenses still fail to reliably mitigate the threat.
View arXiv pageView PDFProject pageGitHub4Add to collection
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.02540 in a model README.md to link it from this page.
Datasets citing this paper1
#### osunlp/SkillHarm Viewer• Updated9 days ago • 879 • 3.72k • 1
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.02540 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
SkillHarness: Harnessing Safe Skills for Computer-Use Agents
SkillHarness is a framework that enables computer-use agents to safely learn and execute skills in dynamic environments by incorporating safety constraints and adaptive skill selection mechanisms, reducing unsafe rates by 57.1%.
Skill Inspector
Skill Inspector is a developer tool that audits AI agent skills to help prevent malware risks.
SkillMaster: Toward Autonomous Skill Mastery in LLM Agents
This paper introduces SkillMaster, a training framework that enables LLM agents to autonomously create, refine, and select skills through trajectory-informed review and counterfactual utility evaluation.
Not All Skills Help: Measuring and Repairing Agent Knowledge
This paper identifies that naive skill accumulation in LLM agents can cause performance regressions, as skills beneficial for some tasks hurt others. The authors propose Assay, a framework that measures per-skill causal contributions and applies per-task masking, achieving state-of-the-art results on AppWorld and τ-bench without weight updates.
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
SkillFlow introduces a benchmark of 166 tasks across 20 families for evaluating autonomous agents' ability to discover, repair, and maintain skills over time through a lifelong learning protocol. Experiments reveal a substantial capability gap among leading models, with Claude Opus 4.6 improving significantly while others show limited or negative gains from skill evolution.