LLM Anonymization Against Agentic Re-Identification

Hugging Face Daily Papers Papers

Summary

AURA is an LLM-powered anonymization framework that balances privacy protection against agentic web-search re-identification while preserving contextual utility through adaptive privacy scopes and mask-reconstruct methods.

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (Anonymization with Utility-Retention Adaptation), an LLM-powered mask-reconstruct framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.
Original Article
View Cached Full Text

Cached at: 06/05/26, 06:09 PM

Paper page - LLM Anonymization Against Agentic Re-Identification

Source: https://huggingface.co/papers/2605.30848

Abstract

AURA is an LLM-powered anonymization framework that balances privacy protection against agentic web-search re-identification while preserving contextual utility through adaptive privacy scopes and mask-reconstruct methods.

Agentic LLMs with web search change the threat model for textanonymization: weak contextual cues can become cross-referenceable evidence forre-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance toagentic web-searchre-identificationand utility retention. We introduce AURA (Anonymizationwith Utility-Retention Adaptation), anLLM-poweredmask-reconstructframework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts usingre-identificationattacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the jointcontextual utilitygrid. Our results show that AURA improves theprivacy-utility frontierby usingadaptive privacy scopeto strengthen resistance to agenticre-identificationand using amask-reconstructanonymizationmethod to better preservecontextual utilityunder fixed privacy scope.

View arXiv pageView PDFProject pageGitHub0Add to collection

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.30848 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.30848 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.30848 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

arXiv cs.LG

This paper proposes an LLM-as-Discriminator method to audit privacy of synthetic tabular data by asking an LLM to classify samples as real or synthetic, showing that LLM discrimination can serve as a practical privacy audit signal.