PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

Hugging Face Daily Papers Papers

Summary

PolicyGuard is a sub-agent verifier that enhances LLM agent policy adherence by providing contextual reasoning and conversation-specific feedback across multi-turn interactions, achieving significant improvements on the tau^2-BENCH benchmark.

LLM agents handle user requests on behalf of organizations through tool calls and must follow the company policies stated in their system prompts. Prior work approaches this as a safeguarding problem -- external checks that block non-compliant agent actions. We argue that policy adherence is a broader problem: real workflows unfold across many turns, require explicit user confirmation and prerequisite reads, and hinge on the content of the dialogue rather than on any single argument value. Meeting this bar requires (i) full conversation context, (ii) self-reasoning over the policy and the current dialogue, and (iii) conversation-specific remediation that guides the agent's next turn -- three capabilities that prior safeguard work has often underestimated. We introduce POLICYGUARD, a sub-agent verifier that shares the agent's view of the dialogue, reasons over the policy in context, and provides actionable feedback for the agent's next turn. On tau^2-BENCH airline across three vendors (GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Pro) with four trials per setting, POLICYGUARD improves PASS4 by +12.0 / +6.0 / +12.0 pp. Per-call analyses show POLICYGUARD achieves higher policy-violation recall while blocking roughly half as often as argument-level guards.
Original Article
View Cached Full Text

Cached at: 06/30/26, 03:33 AM

Paper page - PolicyGuard: A Dialogue-Grounded Sub-Agent Verifier for Policy Adherence in LLM Agents

Source: https://huggingface.co/papers/2606.29225

Abstract

POLICYGUARD is a sub-agent verifier that enhances LLM agent policy adherence by providing contextual reasoning and conversation-specific feedback across multi-turn interactions.

LLM agentshandle user requests on behalf of organizations through tool calls and must follow the company policies stated in their system prompts. Prior work approaches this as asafeguardingproblem -- external checks that block non-compliant agent actions. We argue thatpolicy adherenceis a broader problem: real workflows unfold across many turns, require explicit user confirmation and prerequisite reads, and hinge on the content of the dialogue rather than on any single argument value. Meeting this bar requires (i) fullconversation context, (ii)self-reasoningover the policy and the current dialogue, and (iii) conversation-specific remediation that guides the agent’s next turn -- three capabilities that prior safeguard work has often underestimated. We introduce POLICYGUARD, asub-agent verifierthat shares the agent’s view of the dialogue, reasons over the policy in context, and provides actionable feedback for the agent’s next turn. On tau^2-BENCH airline across three vendors (GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Pro) with four trials per setting, POLICYGUARD improves PASS4 by +12.0 / +6.0 / +12.0 pp. Per-call analyses show POLICYGUARD achieves higher policy-violation recall while blocking roughly half as often asargument-level guards.

View arXiv pageView PDFProject pageGitHub0Add to collection

Get this paper in your agent:

hf papers read 2606\.29225

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.29225 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.29225 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.29225 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

PolicyBank: Evolving Policy Understanding for LLM Agents

arXiv cs.CL

PolicyBank proposes a memory mechanism that enables LLM agents to autonomously refine their understanding of organizational policies through iterative interaction and corrective feedback, closing specification gaps that cause systematic behavioral divergence from true requirements. The work introduces a systematic testbed and demonstrates PolicyBank can close up to 82% of policy-gap alignment failures, significantly outperforming existing memory mechanisms.

PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation

arXiv cs.LG

PropGuard is a propagation-aware framework for safeguarding LLM-based multi-agent systems (LLM-MAS) from malicious instructions that propagate across agents and rounds. It constructs a dual-view spatio-temporal graph and uses a GE-GRPO trained inspector to detect and remediate suspicious propagation subgraphs.

Governance by Construction for Generalist Agents

arXiv cs.AI

This paper presents CUGA's policy system, a modular policy-as-code layer that enforces governance at multiple checkpoints in LLM agent execution, enabling predictable and auditable behavior without model fine-tuning.