PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents
Summary
PolicyGuard is a sub-agent verifier that enhances LLM agent policy adherence by providing contextual reasoning and conversation-specific feedback across multi-turn interactions, achieving significant improvements on the tau^2-BENCH benchmark.
View Cached Full Text
Cached at: 06/30/26, 03:33 AM
Paper page - PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents
Source: https://huggingface.co/papers/2606.29225
Abstract
POLICYGUARD is a sub-agent verifier that enhances LLM agent policy adherence by providing contextual reasoning and conversation-specific feedback across multi-turn interactions.
LLM agentshandle user requests on behalf of organizations through tool calls and must follow the company policies stated in their system prompts. Prior work approaches this as asafeguardingproblem -- external checks that block non-compliant agent actions. We argue thatpolicy adherenceis a broader problem: real workflows unfold across many turns, require explicit user confirmation and prerequisite reads, and hinge on the content of the dialogue rather than on any single argument value. Meeting this bar requires (i) fullconversation context, (ii)self-reasoningover the policy and the current dialogue, and (iii) conversation-specific remediation that guides the agent’s next turn -- three capabilities that prior safeguard work has often underestimated. We introduce POLICYGUARD, asub-agent verifierthat shares the agent’s view of the dialogue, reasons over the policy in context, and provides actionable feedback for the agent’s next turn. On tau^2-BENCH airline across three vendors (GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Pro) with four trials per setting, POLICYGUARD improves PASS4 by +12.0 / +6.0 / +12.0 pp. Per-call analyses show POLICYGUARD achieves higher policy-violation recall while blocking roughly half as often asargument-level guards.
View arXiv pageView PDFProject pageGitHub0Add to collection
Get this paper in your agent:
hf papers read 2606\.29225
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.29225 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.29225 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.29225 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
PolicyBank: Evolving Policy Understanding for LLM Agents
PolicyBank proposes a memory mechanism that enables LLM agents to autonomously refine their understanding of organizational policies through iterative interaction and corrective feedback, closing specification gaps that cause systematic behavioral divergence from true requirements. The work introduces a systematic testbed and demonstrates PolicyBank can close up to 82% of policy-gap alignment failures, significantly outperforming existing memory mechanisms.
PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation
PropGuard is a propagation-aware framework for safeguarding LLM-based multi-agent systems (LLM-MAS) from malicious instructions that propagate across agents and rounds. It constructs a dual-view spatio-temporal graph and uses a GE-GRPO trained inspector to detect and remediate suspicious propagation subgraphs.
LabGuard: Grounding Natural-Language Laboratory Rules into Runtime Guards for Embodied Laboratory Agents
LabGuard introduces a framework that translates natural-language laboratory safety rules into executable runtime monitors for embodied agents, achieving a reduction in unsafe events from 39.5% to 23.8% while maintaining task success.
SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning
SingGuard is a policy-adaptive multimodal LLM guardrail model for text, image, and multilingual safety moderation, featuring dynamic reasoning and a new benchmark SingGuard-Bench. It achieves state-of-the-art results across multiple datasets.
Governance by Construction for Generalist Agents
This paper presents CUGA's policy system, a modular policy-as-code layer that enforces governance at multiple checkpoints in LLM agent execution, enabling predictable and auditable behavior without model fine-tuning.