POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Hugging Face Daily Papers 06/06/26, 12:00 AM Papers

llm-agents skill-injection adversarial-attack security position-aware stealthy poisoning

Summary

POISE is a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions, achieving high attack success rates while evading detection by LLM scanners.

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks by Attack Success Rate, which requires the injected payload to execute and the user's task to still pass its verifier in the same trial. Prior skill-poisoning attacks face a reliability-stealth trade-off under this lens: YAML-header injections are reliably loaded but easily inspected, whereas stealthier body injections that place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent's own suspicion. We introduce POISE, a position-aware attack that compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations, LLM scanners are hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering current static defenses ineffective.

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:37 PM

Paper page - POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Source: https://huggingface.co/papers/2606.07943

Abstract

POISE is a stealthy skill-poisoning attack that embeds malicious triggers within benign-looking instructions, achieving high attack success rates while avoiding detection by LLM scanners that are overly sensitive to privileged tool operations.

Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them toskill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user’s legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks byAttack Success Rate, which requires the injected payload to execute and the user’s task to still pass its verifier in the same trial. Priorskill-poisoning attacksface a reliability-stealth trade-off under this lens:YAML-header injectionsare reliably loaded but easily inspected, whereas stealthierbody injectionsthat place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent’s own suspicion. We introduce POISE, aposition-aware attackthat compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using acontext-aware generatorto blend it with nearby setup or prerequisite steps. On Skill-Inject withcodex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations,LLM scannersare hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering currentstatic defensesineffective.

View arXiv page View PDF GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2606\.07943

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.07943 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.07943 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.07943 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Paper page - POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

I got paranoid about OpenClaw skills injecting crap into my system prompt, so I built a quarantine pipeline with two LLMs as reviewers (93.75% detection, zero false negatives)

Prompt Injection as Role Confusion

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Prompt Injection as Role Confusion

Submit Feedback

Similar Articles

Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

I got paranoid about OpenClaw skills injecting crap into my system prompt, so I built a quarantine pipeline with two LLMs as reviewers (93.75% detection, zero false negatives)

Prompt Injection as Role Confusion

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Prompt Injection as Role Confusion