policy-compliance

Tag

Cards List
#policy-compliance

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

arXiv cs.CL · 2026-05-21 Cached

This paper presents a large-scale assessment of medical LLMs, including custom MedGPTs and open-source models, finding 25-30% exhibit low factual accuracy and 33.6-54.3% violate operational thresholds, highlighting systemic safety risks.

0 favorites 0 likes
#policy-compliance

PolicyBank: Evolving Policy Understanding for LLM Agents

arXiv cs.CL · 2026-04-20 Cached

PolicyBank proposes a memory mechanism that enables LLM agents to autonomously refine their understanding of organizational policies through iterative interaction and corrective feedback, closing specification gaps that cause systematic behavioral divergence from true requirements. The work introduces a systematic testbed and demonstrates PolicyBank can close up to 82% of policy-gap alignment failures, significantly outperforming existing memory mechanisms.

0 favorites 0 likes
← Back to home

Submit Feedback