privacy-benchmark

Tag

Cards List
#privacy-benchmark

POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

arXiv cs.AI · 2026-05-20 Cached

POLAR-Bench is a diagnostic benchmark that evaluates the privacy-utility trade-off in LLM agents by testing their ability to follow privacy policies while being adversarially probed by third-party models. Results show frontier models protect over 99% of protected attributes but smaller open-weight models leak over half, highlighting gaps in intent-following.

0 favorites 0 likes
← Back to home

Submit Feedback