policy-compliance

#policy-compliance

Policy-Conditioned Constrained Decoding for Column-Level Access Control in Text-to-SQL

arXiv cs.CL ↗ · 2026-07-15 Cached

This paper introduces PCC-SQL, a method for enforcing column-use policies in text-to-SQL generation by constrained decoding, achieving deterministic elimination of violations with 0% Leakage Rate and high coverage on benchmarks.

0 favorites 0 likes

#policy-compliance

Reason Less, Verify More: Deterministic Gates Recover a Silent Policy-Violation Failure Mode in Tool-Using LLM Agents

arXiv cs.AI ↗ · 2026-07-09 Cached

This paper identifies a silent failure mode in tool-using LLM agents where policy violations occur without tool errors or agent self-reporting. The authors propose and evaluate lightweight deterministic pre-execution gates that significantly reduce such failures in the τ²-bench airline domain.

0 favorites 0 likes

#policy-compliance

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper presents a large-scale assessment of medical LLMs, including custom MedGPTs and open-source models, finding 25-30% exhibit low factual accuracy and 33.6-54.3% violate operational thresholds, highlighting systemic safety risks.

0 favorites 0 likes

#policy-compliance

PolicyBank: Evolving Policy Understanding for LLM Agents

arXiv cs.CL ↗ · 2026-04-20 Cached

PolicyBank proposes a memory mechanism that enables LLM agents to autonomously refine their understanding of organizational policies through iterative interaction and corrective feedback, closing specification gaps that cause systematic behavioral divergence from true requirements. The work introduces a systematic testbed and demonstrates PolicyBank can close up to 82% of policy-gap alignment failures, significantly outperforming existing memory mechanisms.

0 favorites 0 likes

policy-compliance

Policy-Conditioned Constrained Decoding for Column-Level Access Control in Text-to-SQL

Reason Less, Verify More: Deterministic Gates Recover a Silent Policy-Violation Failure Mode in Tool-Using LLM Agents

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

PolicyBank: Evolving Policy Understanding for LLM Agents

Submit Feedback