From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors
Summary
This paper introduces multi-step trojan attacks against local LLM agents, where malicious prompts are embedded across multiple operations to bypass existing defenses. It proposes ClawTrojan benchmark and DASGuard defense to detect and mitigate such attacks.
View Cached Full Text
Cached at: 06/01/26, 03:18 AM
Paper page - From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors
Source: https://huggingface.co/papers/2605.31042
Abstract
Multi-step trojan attacks in local LLM agents can bypass existing defenses by embedding malicious prompts across multiple operations, requiring new detection methods like DASGuard for effective protection.
LLM agentsare evolving from conversational chatbots to operational tools in real-world workspaces. In local agentic harnesses, an LLM can read and write files, call tools, and reuse workspace state across sessions. While such capabilities enhance utility, they also expose a new attack surface for attackers. Attackers can embed aprompt injectionwithin a file or tool output. Agents may read this hidden instruction, store it, and execute it later. In thismulti-step trojan attackparadigm, no individual step appears malicious on its own, but these steps can collectively turn untrusted text into persistent control content. However, existing defenses often inspect each step in isolation. As a result, they can block a clear harmful action, but fail to detect the earlier write operation that plants the backdoor. To reveal this threat, we introduce ClawTrojan, a benchmark designed to identifymulti-step trojan attacks in local agentic harnesses. In anOpenClaw-style simulated workspace with GPT-5.4, ClawTrojan reaches a 95.5% attack success rate (ASR), while existing single-turn prompt-injection attacks produce near-zero ASR on the same model. To address this threat, we proposeDASGuard, which scans control-like text in sensitive local files, traces its origin, and removes control content that does not originate from a trusted source. Our results show thatDASGuardachieves strong dynamic defense by combiningruntime attack blockingwithsanitized commitsto the workspace.
View arXiv pageView PDFGitHub1Add to collection
Get this paper in your agent:
hf papers read 2605\.31042
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.31042 in a model README.md to link it from this page.
Datasets citing this paper1
#### zstanjj/ClawTrojan Updated3 minutes ago
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.31042 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Is Qwen3-VL-2B the only viable VLM for JSON extraction on a "potato"?
The author claims Qwen3-VL-2B is the only viable vision-language model for JSON extraction on low-end hardware, outperforming larger models like Qwen3-VL-4B, yet it is absent from major benchmarks.
TMD’s keyless bike lock is a $280 solution to a $60 problem
A review of TMD's smart bike lock that uses Bluetooth proximity and a motion alarm, priced at $280, which is high compared to traditional locks.
@GergelyOrosz: Hey Bloomberg @technology, this profile @collinskenley1 is impersonating a reporter who does not exist Many such cases …
Gergely Orosz alerts Bloomberg's technology account about a profile impersonating a non-existent reporter, calling it a new form of scamming that should be banned.
@NFTCPS: Damn, a doxxing tool is here! Just enter a username, and it scrapes over 840 platforms for you. It's called ALIENS EYE. It's not stupid—not just guessing by HTTP status codes—it uses a trained ML model with 25 features to judge, with results in three tiers: Found, Maybe, Not Found, …
ALIENS EYE is an AI-powered open-source username scanner that uses a machine learning model and 25 features to detect across 840+ platforms, with support for proxies, Tor, and multiple export formats.
US Ban Benchmark Updated: Toe-to-toe Between Two Big Names!
The US ban benchmark has been updated, highlighting a close competition between two major technology companies.