Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

arXiv cs.CL Papers

Summary

This paper introduces DiSan, a privacy-preserving text sanitization framework for distributed agent collaboration. By disentangling source-invariant role content from source-identifying style, DiSan reduces PII exposure 20× while maintaining 83% answer faithfulness on a multi-agent RAG benchmark, outperforming traditional masking approaches.

arXiv:2606.15335v1 Announce Type: new Abstract: When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from distributional signatures such as formatting conventions, vocabulary choices, and syntactic patterns. We propose DiSan(Disentangled Sanitization), a privacy-preserving sanitization framework and a built-in component of Intern-Shannon for multi-agent collaboration. DiSan uses a two-stream encoder to factorize text into a source-invariant role subspace that preserves task semantics and a source-identifying style subspace that remains local. Federated proto-type alignment and adversarial regularization enable joint training without centralizing raw text. Experiments show that identifier-level masking is insufficient: masking 19.2% of tokens reduces TF-IDF stylometric attribution by only 18.6%. By contrast, DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness on a distributed multi-agent RAG benchmark, and lowers Enron stylometric attribution by 73.2% under TF-IDF and 70.6% under a neural probe.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:47 AM

# Intern-Shannon Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations
Source: [https://arxiv.org/html/2606.15335](https://arxiv.org/html/2606.15335)
Xuan Liu\*Hefeng Zhou\*Sicheng ChenChao YangXingcheng Xu Jingjing Qu†Jiong LouJie LIXia Hu 1Shanghai Artificial Intelligence Laboratory 2Shanghai Jiao Tong University

###### Abstract

When distributed agents exchange text across organizational boundaries, privacy leakage arises not only from explicit identifiers but also from*distributional signatures*such as formatting conventions, vocabulary choices, and syntactic patterns\. We proposeDiSan\(DisentangledSanitization\), a privacy\-preserving sanitization framework and a built\-in component of Intern\-Shannon for multi\-agent collaboration\.DiSanuses a two\-stream encoder to factorize text into a source\-invariant role subspace that preserves task semantics and a source\-identifying style subspace that remains local\. Federated prototype alignment and adversarial regularization enable joint training without centralizing raw text\. Experiments show that identifier\-level masking is insufficient: masking 19\.2% of tokens reduces TF\-IDF stylometric attribution by only 18\.6%\. By contrast,DiSanreduces answer\-level PII exposure by 20×\\timeswhile maintaining 83% answer faithfulness on a distributed multi\-agent RAG benchmark, and lowers Enron stylometric attribution by 73\.2% under TF\-IDF and 70\.6% under a neural probe\.

![[Uncaptioned image]](https://arxiv.org/html/2606.15335v1/x4.png)![[Uncaptioned image]](https://arxiv.org/html/2606.15335v1/x5.png)![[Uncaptioned image]](https://arxiv.org/html/2606.15335v1/x6.png)Intern\-Shannon Privacy\-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

\*\*footnotetext:Equal contribution\. †Corresponding author\. Code is available at[https://github\.com/RezinChow/DiSan](https://github.com/RezinChow/DiSan)\. Intern\-Shannon is the next\-generation Agentic Operating System developed by Shanghai AI Lab, which will be officially released soon\.## 1Introduction

![Refer to caption](https://arxiv.org/html/2606.15335v1/x7.png)

Figure 1:Privacy risks in cross\-organizational text sharing\.Top: Three representative collaboration settings, ranging from no protection to preliminary filtering, each leaving source\-identifying content exposed\.Bottom:DiSanproduces sanitized text that preserves task\-relevant semantics while removing both explicit PII and distributional source signatures\.Cross\-organizational collaboration on text\-intensive tasks, including retrieval\-augmented generationLewis et al\. \([2020](https://arxiv.org/html/2606.15335#bib.bib17)\), distributed question answering, and cross\-institutional case retrievalStubbs et al\. \([2015](https://arxiv.org/html/2606.15335#bib.bib37)\), requires parties to share textual evidence while keeping raw data local\. Each party may hold proprietary document collections with distinct domain expertise, and a common pattern is for a requesting party to seek auxiliary evidence from helpers, who retrieve local snippets and transmit them for downstream use such as answer synthesisMinaee et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib28)\)\.

However, any inter\-agent text exchange exposes private organizational information beyond what public capability tags disclose\. As illustrated in[Figure˜1](https://arxiv.org/html/2606.15335#S1.F1), this risk spans all levels of cross\-organizational collaboration: from intra\-organization transfers with no protection to inter\-alliance sharing where only preliminary filtering is applied\. In each setting, shared text leaks private organizational information at two levels: explicitly through identifiers such as names, account numbers, and addresses, and implicitly through*distributional signatures*: formatting conventions, vocabulary choices, and syntactic patterns that encode the originating party’s internal practicesMalik and Dustdar \([2011](https://arxiv.org/html/2606.15335#bib.bib24)\)\. This is fundamentally a representation\-level problem, not an identifier\-level one: private organizational information is a property of the text distribution, not of individual identifiers, and anonymization methods that operate in text space cannot alter these distributional properties\.[Table˜1](https://arxiv.org/html/2606.15335#S1.T1)makes this concrete: raw sharing exposes not only a counterparty identity but also a proprietary bulletin format, reference scheme, and sector taxonomy; placeholder masking hides surface identifiers but leaves the naming convention intact and collapses distinct entities into generic tokens, weakening grounding, provenance tracking, and cross\-document aggregation\.DiSaninstead preserves role facts such as exposure amount, sector, rating change, and review status while suppressing source\-specific fingerprints\.

Table 1:Compact CorporateBank→\\toAssetManager financial\-risk example\.Boldmarks source\-identifying patterns that survive identifier masking\.Raw private text \(dd\)\.“PerMeridian Bank’s Counterparty Risk Bulletin \(Ref: CR\-2024\-047\),Apex Dynamicscarried $6\.1M exposure as of Q3 close, was downgraded to BB\+, and was flagged for portfolio review\.”

Placeholder masking only\.“PerMeridian Bank’s Counterparty Risk Bulletin \(Ref: \[ID\]\),\[ORG\]carried $6\.1M exposure as of Q3 close, was downgraded to BB\+, and was flagged for portfolio review\.”

DiSanoutput \(d~\\tilde\{d\}\)\.“A corporate\-bank Q3 risk bulletin flags an industrials counterparty with $6\.1M exposure, a BB\+ downgrade, and mandatory portfolio review\.”

Existing approaches address symptoms rather than structure\. Rule\-based PII detectorsLi et al\. \([2021a](https://arxiv.org/html/2606.15335#bib.bib19)\)target individual identifiers such as named entities and account numbers but are blind to distributional signatures, since private organizational information is distributed across the text as statistical patterns, not localized to individual spans\. LLM\-based paraphrasingXiao et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib43)\)reshuffles surface form but provides no mechanism to ensure the output distribution is statistically source\-invariant\. Federated learningMcMahan et al\. \([2017](https://arxiv.org/html/2606.15335#bib.bib26)\)decentralizes model training but produces shared*predictors*, not shareable*data*\. The core challenge is structural: what is the minimal sufficient representation that preserves task semantics while being statistically source\-invariant?

We proposeDiSan, a sanitization framework for cross\-agent evidence exchange\. It learns a role–style factorization of each evidence snippet, where the*role*subspace preserves task\-relevant semantics and the*style*subspace captures source\-linked variation\. Orthogonality encourages the two subspaces to separate, while prototype alignment keeps role representations comparable across non\-IID agents without centralizing raw text\. The resulting sanitizer produces shareable text from the role stream while keeping style information local\.DiSanfurther serves as a key privacy\-preserving component ofIntern\-Shannon, where it is integrated as a built\-in text\-sanitization module and can be invoked on demand during multi\-agent collaboration\.

##### Contributions\.

\(i\) Disentangled sanitization for text sharing:We formulate cross\-agent text sanitization as role–style factorization, separating task semantics from source\-linked variation\.\(ii\) Federated role alignment:We introduce lightweight prototype alignment to stabilize role spaces across non\-IID agents without centralizing raw text\.\(iii\) Privacy diagnostics across sharing surfaces:We evaluate privacy at the output, representation, and prototype levels, distinguishing application\-stage leakage from training\-stage artifacts\.\(iv\) Empirical validation:On distributed\-agent RAG,DiSanreduces answer\-level PII exposure by 20×\\timeswhile maintaining 83% answer faithfulness\. On Enron emails, it reduces TF\-IDF stylometric attribution by 73\.2%, substantially outperforming identifier\-level masking\.

## 2Related Work

##### Privacy\-Preserving Machine Learning\.

Protecting privacy across distributed data sources is a persistent challenge in collaborative machine learningLi et al\. \([2021a](https://arxiv.org/html/2606.15335#bib.bib19)\)\. Differential privacy \(DP\)Dwork et al\. \([2006](https://arxiv.org/html/2606.15335#bib.bib6)\)provides formal guarantees, and DP\-SGDOuadrhiri and Abdelhadi \([2022](https://arxiv.org/html/2606.15335#bib.bib29)\)extends these to deep learningFeldman et al\. \([2020](https://arxiv.org/html/2606.15335#bib.bib7)\); Abadi et al\. \([2016](https://arxiv.org/html/2606.15335#bib.bib1)\); Canonne et al\. \([2020](https://arxiv.org/html/2606.15335#bib.bib3)\)\. Federated learningMcMahan et al\. \([2017](https://arxiv.org/html/2606.15335#bib.bib26)\)enables collaborative model training without sharing raw dataLi et al\. \([2020a](https://arxiv.org/html/2606.15335#bib.bib18)\); Liu et al\. \([2022](https://arxiv.org/html/2606.15335#bib.bib22)\); Karimireddy et al\. \([2020](https://arxiv.org/html/2606.15335#bib.bib15)\)\. Within this paradigm, prototype\-based methodsTan et al\. \([2021](https://arxiv.org/html/2606.15335#bib.bib38)\); Zhang et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib46)\), representation\-based approachesLi et al\. \([2021b](https://arxiv.org/html/2606.15335#bib.bib20)\); Wu et al\. \([2021](https://arxiv.org/html/2606.15335#bib.bib42)\), and communication\-efficient techniquesZhang et al\. \([2022](https://arxiv.org/html/2606.15335#bib.bib47)\); Wu et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib41)\)have been proposed\. Recent federated RAG formulations aim to enable multi\-party retrieval under privacy constraintsQian et al\. \([2025](https://arxiv.org/html/2606.15335#bib.bib30)\); He et al\. \([2025](https://arxiv.org/html/2606.15335#bib.bib11)\); Mao et al\. \([2025](https://arxiv.org/html/2606.15335#bib.bib25)\); Chakraborty et al\. \([2025](https://arxiv.org/html/2606.15335#bib.bib4)\)\. While these methods focus on training shared models or retrievers, our work addresses a complementary problem:*sanitizing text data itself*so it can be safely shared for downstream use\.

##### Text Sanitization and De\-identification\.

Traditional text de\-identification relies on rule\-based or NER\-based PII detection followed by masking or replacementMalik and Dustdar \([2011](https://arxiv.org/html/2606.15335#bib.bib24)\)\. While effective for explicit identifiers, these methods miss implicit leakage through writing style, document structure, and domain\-specific patterns\. Authorship attribution researchStamatatos \([2009](https://arxiv.org/html/2606.15335#bib.bib36)\)demonstrates that stylometric features can identify sources even from short texts\. Recent work explores LLM\-based paraphrasing for privacyShi et al\. \([2025](https://arxiv.org/html/2606.15335#bib.bib35)\)and DP\-based text generationMeisenbacher and Matthes \([2024](https://arxiv.org/html/2606.15335#bib.bib27)\); Xie et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib44)\)\. However, DP\-text methods incur severe utility degradation \(30–50% coherence loss forϵ<10\\epsilon<10\) that is prohibitive for RAG applications requiring semantic fidelityMeisenbacher and Matthes \([2024](https://arxiv.org/html/2606.15335#bib.bib27)\)\. Our approach addresses both explicit PII and implicit stylistic fingerprints through learned disentanglement, achieving strong empirical privacy without the utility cost of DP noise on text outputs\.

##### Disentangled Representations\.

Disentangled representation learning aims to separate independent factors of variation in dataBengio et al\. \([2013](https://arxiv.org/html/2606.15335#bib.bib2)\)\. In NLP, disentanglement has been applied to separate content from style for style transferJohn et al\. \([2019](https://arxiv.org/html/2606.15335#bib.bib14)\), sentiment from semantics, and speaker identity from linguistic content in speechQian et al\. \([2019](https://arxiv.org/html/2606.15335#bib.bib31)\)\. Recent work enables explicit control in text generation via disentangled representationsLiu et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib23)\); Han et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib10)\)\. These works establish that separating semantic content from stylistic or identity\-related factors supports both controllable generation and privacy; we apply the same principle to text sharing, where the goal is to isolate task\-relevant content from source\-identifying patterns\.

## 3Problem Statement

### 3\.1Text Sharing for Distributed Agent Collaboration

We considerCCdistributed agents, each hosting a private document repository𝒟c\\mathcal\{D\}\_\{c\}\. When a requesting agent cannot answer a queryqqlocally, it routes to a candidate helper set𝒞​\(q\)\\mathcal\{C\}\(q\)via embedding similarity over publiccapability tags, such as “AssetManager” and “CorporateBank”\. Each helper retrieves a local snippetddand returns a*sanitized*snippetd~\\tilde\{d\}for downstream use in RAG or distributed question answering\.

[Table˜1](https://arxiv.org/html/2606.15335#S1.T1)concretizes the sanitization goal: raw text exposes explicit PII and institutional style fingerprints; placeholder masking removes identifiers but leaves distributional signatures intact and can weaken downstream grounding when masked spans are task\-relevant; only a representation\-level approach targets both privacy risks while preserving task semantics\.

### 3\.2Threat Scope

Our primary privacy concern arises in the*application stage*\. A requesting agent receives sanitized evidenced~\\tilde\{d\}from a helper agent and may try to infer information beyond the helper’s public capability tag, including explicit PII, writing patterns tied to the source, or organizational document fingerprints\. The sanitizer is trained federatively across the same agents; the coordinator follows the protocol and does not access raw text\. We additionally evaluate whether training artifacts such as uploaded prototypes contain persistent distributional signatures tied to individual sources beyond public tags\. We do not claim formal differential privacy or robustness to malicious servers, poisoning, prompt injection, or collusion; the complete threat model is in[Appendix˜A](https://arxiv.org/html/2606.15335#A1)\.

##### Goals and validation\.

We do not hide public capability tags or agent participation; we aim to prevent leakage beyond these facts while preserving downstream RAG utility\. Our validation follows this scope: PII leakage and answer exposure measure explicit identifiers, stylometry measures source signatures in transmitted text, embedding and prototype attribution diagnose learned sharing artifacts, and F1, faithfulness, and ChunkHit@3 measure utility\.

## 4Method

##### Overview\.

DiSanenforces role–style orthogonality as an explicit architectural constraint: a two\-stream encoder projects each input into arolesubspace encoding source\-invariant task semantics and astylesubspace encoding agent\-specific variation\. Role representations are used to decode sanitized textd~\\tilde\{d\}; style representations assist local generation for fluency and are then discarded\. Onlyd~\\tilde\{d\}crosses the privacy boundary\.

Training without centralizing raw text poses a calibration challenge: per\-agent isolation causes role spaces to drift across agents, degrading both utility and privacy\.DiSanaddresses this by exchanging compact role prototypes, aligning local role distributions to shared global anchors, and applying adversarial regularization to suppress source\-specific prototype signatures beyond public agent tags\.[Figure˜2](https://arxiv.org/html/2606.15335#S4.F2)illustrates the architecture\.

![Refer to caption](https://arxiv.org/html/2606.15335v1/x8.png)

Figure 2:DiSanarchitecture\.Left \(agent\):A two\-stream encoder produces role representations𝐙r\\mathbf\{Z\}\_\{r\}capturing source\-invariant semantics and style representations𝐙s\\mathbf\{Z\}\_\{s\}capturing agent\-specific variation;ℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}enforces their separation\. Both are fused locally for decoding;𝐙s\\mathbf\{Z\}\_\{s\}is discarded after use and never transmitted\. Only role prototypes𝝁c\\boldsymbol\{\\mu\}\_\{c\}cross the privacy boundary\.Right \(server\):Prototypes are aligned spherically; gradient reversal suppresses source\-specific signatures in uploaded prototypes\.
### 4\.1Role–Style Disentangled Encoder

##### Two\-stream projection\.

Given an evidence sequenced=\(d1,…,dT\)d=\(d\_\{1\},\\ldots,d\_\{T\}\), a pretrained encoder produces hidden states𝐇=Encoder​\(d\)∈ℝT×denc\\mathbf\{H\}=\\mathrm\{Encoder\}\(d\)\\in\\mathbb\{R\}^\{T\\times d\_\{\\text\{enc\}\}\}\. We project𝐇\\mathbf\{H\}into a*role*stream and a*style*stream \(backbone and projection dimensions in[Section˜B\.3](https://arxiv.org/html/2606.15335#A2.SS3)\):

𝐙r\\displaystyle\\mathbf\{Z\}\_\{r\}=𝐇𝐖r,𝐖r∈ℝdenc×dr,𝐙r∈ℝT×dr,\\displaystyle=\\mathbf\{H\}\\mathbf\{W\}\_\{r\},\\quad\\mathbf\{W\}\_\{r\}\\in\\mathbb\{R\}^\{d\_\{\\text\{enc\}\}\\times d\_\{r\}\},\\ \\mathbf\{Z\}\_\{r\}\\in\\mathbb\{R\}^\{T\\times d\_\{r\}\},\(1\)𝐙s\\displaystyle\\mathbf\{Z\}\_\{s\}=𝐇𝐖s,𝐖s∈ℝdenc×ds,𝐙s∈ℝT×ds\.\\displaystyle=\\mathbf\{H\}\\mathbf\{W\}\_\{s\},\\quad\\mathbf\{W\}\_\{s\}\\in\\mathbb\{R\}^\{d\_\{\\text\{enc\}\}\\times d\_\{s\}\},\\ \\mathbf\{Z\}\_\{s\}\\in\\mathbb\{R\}^\{T\\times d\_\{s\}\}\.\(2\)In our implementation, the encoder–decoder backbone is LongT5\-TGlobal\-Base\. Both𝐙r\\mathbf\{Z\}\_\{r\}and𝐙s\\mathbf\{Z\}\_\{s\}are 256\-dimensional token streams; their concatenation forms a 512\-dimensional bottleneck that is projected back todencd\_\{\\text\{enc\}\}before decoding\. Intuitively,𝐙r\\mathbf\{Z\}\_\{r\}captures shareable content structure \(entities/relations/events\), while𝐙s\\mathbf\{Z\}\_\{s\}captures agent\-specific phrasing and formatting\.

##### Fusion for generation\.

We fuse the two streams before decoding via𝐇fused=g​\(\[𝐙r;𝐙s\]\)\\mathbf\{H\}\_\{\\text\{fused\}\}=g\(\[\\mathbf\{Z\}\_\{r\};\\mathbf\{Z\}\_\{s\}\]\), whereggis a learned projection back todencd\_\{\\text\{enc\}\}\. An optional residual path𝐇out=α⋅𝐇fused\+\(1−α\)⋅𝐇\\mathbf\{H\}\_\{\\text\{out\}\}=\\alpha\\cdot\\mathbf\{H\}\_\{\\text\{fused\}\}\+\(1\-\\alpha\)\\cdot\\mathbf\{H\}stabilizes early training \(details in[Section˜B\.1](https://arxiv.org/html/2606.15335#A2.SS1)\)\. The pretrained LongT5 Transformer decoder then generatesd~\\tilde\{d\}autoregressively from the modified encoder states;DiSaninserts a role–style bottleneck rather than replacing the generator\.

Although𝐙s\\mathbf\{Z\}\_\{s\}may contain source\-linked variation, it is used only inside the helper during decoding and is never transmitted\. The decoder is trained to preserve role facts while removing identifiers and source\-specific wording, so the local style stream serves as a fluency aid rather than a shared artifact\. Residual leakage is evaluated on the transmittedd~\\tilde\{d\}through output\-level PII and stylometry metrics\.

##### Disentanglement\.

Let𝐳¯r=1T​∑t=1T𝐳r,t\\bar\{\\mathbf\{z\}\}\_\{r\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}\\mathbf\{z\}\_\{r,t\}and𝐳¯s=1T​∑t=1T𝐳s,t\\bar\{\\mathbf\{z\}\}\_\{s\}=\\frac\{1\}\{T\}\\sum\_\{t=1\}^\{T\}\\mathbf\{z\}\_\{s,t\}denote mean\-pooled vectors\. We encourage separation via:

ℒorth=cos2⁡\(𝐳¯r,𝐳¯s\)=\(𝐳¯r⊤​𝐳¯s‖𝐳¯r‖2​‖𝐳¯s‖2\+ϵ\)2\.\\mathcal\{L\}\_\{\\text\{orth\}\}=\\cos^\{2\}\(\\bar\{\\mathbf\{z\}\}\_\{r\},\\bar\{\\mathbf\{z\}\}\_\{s\}\)=\\left\(\\frac\{\\bar\{\\mathbf\{z\}\}\_\{r\}^\{\\top\}\\bar\{\\mathbf\{z\}\}\_\{s\}\}\{\\\|\\bar\{\\mathbf\{z\}\}\_\{r\}\\\|\_\{2\}\\,\\\|\\bar\{\\mathbf\{z\}\}\_\{s\}\\\|\_\{2\}\+\\epsilon\}\\right\)^\{2\}\.\(3\)

### 4\.2Prototype Alignment on the Role Space

To train the privacy transformer under non\-IID agents, we align*role distributions*across agents using lightweight role prototypes, providing global anchors without sharing raw evidence\.

##### Prototype computation and aggregation\.

For each role/placeholder typek∈𝒦k\\in\\mathcal\{K\}, each agent computes a local prototype𝝁k\(c\)\\boldsymbol\{\\mu\}^\{\(c\)\}\_\{k\}as an EMA of batch role centroids \(token\-level averages over type\-kkpositions\)\. At each training round, the server aggregates a sample\-weighted global prototype𝝁k∗\\boldsymbol\{\\mu\}^\{\*\}\_\{k\}\. Full definitions are provided in[Section˜B\.2](https://arxiv.org/html/2606.15335#A2.SS2)\.

##### Spherical alignment\.

To avoid magnitude\-based leakage, we align on the unit hypersphere:

ℒproto=∑k∈𝒦\(1−cos⁡\(𝐳¯^r,k,𝝁^k∗\)\),\\mathcal\{L\}\_\{\\text\{proto\}\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\big\(1\-\\cos\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\},\\hat\{\\boldsymbol\{\\mu\}\}^\{\*\}\_\{k\}\)\\big\),\(4\)where⋅^\\hat\{\\cdot\}denotesℓ2\\ell\_\{2\}\-normalization, and𝐳¯r,k\\bar\{\\mathbf\{z\}\}\_\{r,k\}is the batch role centroid for typekk, defined as the token\-level average over positions labeled as typekk\(see[Section˜B\.2](https://arxiv.org/html/2606.15335#A2.SS2)\)\.

##### Prototype\-level adversarial training\.

Prototypes may still carry source\-specific distributional signatures beyond public agent tags\. We apply a discriminatorDψD\_\{\\psi\}with gradient reversal \(GRL\)Raff and Sylvester \([2018](https://arxiv.org/html/2606.15335#bib.bib32)\)to make prototypes less predictive of their source:

ℒadv=∑k∈𝒦CE​\(Dψ​\(GRLγ​\(𝝁¯k\(c\)\)\),c\),\\mathcal\{L\}\_\{\\text\{adv\}\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\mathrm\{CE\}\\\!\\left\(D\_\{\\psi\}\(\\mathrm\{GRL\}\_\{\\gamma\}\(\\bar\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}\)\),\\,c\\right\),\(5\)whereγ\\gammais the GRL strength and𝝁¯k\(c\)\\bar\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}is a gradient\-enabled estimate \([Section˜B\.2](https://arxiv.org/html/2606.15335#A2.SS2)\)\. All attribution discriminators use the same compact MLP template, input→128→128→C\\to 128\\to 128\\to C, with ReLU activations and dropout 0\.1; the prototype discriminator takes 256\-dimensional role prototypes as input\.

### 4\.3Federated Optimization

##### Local objective\.

On agentcc, we minimize:

ℒlocal\(c\)=ℒseq\+λorth​ℒorth\+λp​ℒproto\+λadv​ℒadv\+ℒprox,\\begin\{split\}\\mathcal\{L\}^\{\(c\)\}\_\{\\text\{local\}\}=\{\}&\\mathcal\{L\}\_\{\\text\{seq\}\}\+\\lambda\_\{\\text\{orth\}\}\\mathcal\{L\}\_\{\\text\{orth\}\}\+\\lambda\_\{\\text\{p\}\}\\mathcal\{L\}\_\{\\text\{proto\}\}\\\\ &\+\\lambda\_\{\\text\{adv\}\}\\mathcal\{L\}\_\{\\text\{adv\}\}\+\\mathcal\{L\}\_\{\\text\{prox\}\},\\end\{split\}\(6\)whereℒseq\\mathcal\{L\}\_\{\\text\{seq\}\}is the token\-level cross\-entropy on sanitization targetsd~\\tilde\{d\}, andℒprox=ν2​‖θ−θ∗‖22\\mathcal\{L\}\_\{\\text\{prox\}\}=\\frac\{\\nu\}\{2\}\\\|\\theta\-\\theta^\{\*\}\\\|\_\{2\}^\{2\}is a FedProx term discouraging drift from the global model \(we useν\\nuto distinguish from prototype symbols𝝁\\boldsymbol\{\\mu\}\)\. After local optimization, the server aggregates model weights and prototypes\. Before uploading, agents applyℓ2\\ell\_\{2\}\-normalization and Gaussian noise perturbation \(σ=0\.01\\sigma=0\.01\) to prototypes \([Section˜D\.4\.1](https://arxiv.org/html/2606.15335#A4.SS4.SSS1)\)\. For communication efficiency, we train LoRA adapters on the encoder attention projections together with the role projection and fusion layers\. The style projection remains local and is never uploaded; per\-round prototype exchange is only\|𝒦\|×256\|\\mathcal\{K\}\|\\times 256floating\-point values per agent, negligible compared with model synchronization\.

Table 2:Main single\-round results\.Bold= best,underline= second best\. Metrics are defined in[Section˜5\.3](https://arxiv.org/html/2606.15335#S5.SS3)\.

### 4\.4Deployment: Multi\-Agent Text Sharing

The preceding sections describe*how to train*the sanitizer; this section describes*how agents use it*at deployment\. Each agent deploys the trained model as a local sanitizer: given raw textdd, it producesd~\\tilde\{d\}via role–style fusion and decoding\. Crucially,𝐙s\\mathbf\{Z\}\_\{s\}is used only locally to improve generation quality and isdiscarded after decoding\. Only the sanitized textd~\\tilde\{d\}is transmitted\. This design enables flexible inter\-agent data\-sharing protocols while preserving the privacy guarantees established during training\.

##### RAG pipeline\.

A requesting agent routes a queryqqto helper agents𝒞\\mathcal\{C\}via capability\-based routing \([Section˜3\.1](https://arxiv.org/html/2606.15335#S3.SS1)\)\. Each helperc∈𝒞c\\in\\mathcal\{C\}retrieves local evidencedcd\_\{c\}and returns:

d~c=Sanitizeθ​\(dc,q\),\\tilde\{d\}\_\{c\}=\\mathrm\{Sanitize\}\_\{\\theta\}\(d\_\{c\},q\),\(7\)whereSanitizeθ\\mathrm\{Sanitize\}\_\{\\theta\}denotes the trained sanitizer \([Section˜4\.1](https://arxiv.org/html/2606.15335#S4.SS1)–[4\.3](https://arxiv.org/html/2606.15335#S4.SS3)\)\. The requester aggregates𝒟~=\{d~c∣c∈𝒞\}\\tilde\{\\mathcal\{D\}\}=\\\{\\tilde\{d\}\_\{c\}\\mid c\\in\\mathcal\{C\}\\\}and generates a final answer\. This protocol naturally extends to multi\-turn settings where the requester iteratively refines queries based on accumulated evidence \(see[Section˜5\.6](https://arxiv.org/html/2606.15335#S5.SS6)for details\)\. Convergence analysis is provided in[Section˜B\.4](https://arxiv.org/html/2606.15335#A2.SS4.SSS0.Px3)\.

## 5Experiments

### 5\.1Experimental Setup

##### Dataset\.

We use a multilingual synthetic finance corpus with annotated PII spans,synthetic\_pii\_finance\_multilingualWatson et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib40)\)\.

##### Agent configuration\.

To simulate distributed agent collaboration, we constructC=7C\{=\}7agents by assigning each agent a disjoint inventory of document types\. The resulting partition induces non\-IID skews by design\. We denote agents by their capability tags: CorporateBank, AssetManager, FinTechPay, CorpGroup, MarketForecaster, ComplianceConsult, and SupplierCo\. These tags are treated as public in our threat model \([Section˜3\.2](https://arxiv.org/html/2606.15335#S3.SS2)\); we therefore focus on leakage*beyond*this public prior\. The exact doc\-type identifiers used for each agent are listed in[Section˜C\.5](https://arxiv.org/html/2606.15335#A3.SS5)\.

##### RAG evaluation pipeline\.

Documents are chunked into fixed windows \(256 tokens, overlap 50\) and indexed per agent\. Given a query, the requesting agent routes to a candidate helper set via tag\-based routing \([Section˜3\.1](https://arxiv.org/html/2606.15335#S3.SS1)\), retrieves top\-kkevidence from each helper’s local index using a BGE\-M3 hybrid pipeline, receives sanitized snippetsd~\\tilde\{d\}, and generates the final answer\. Grounded QA examples are synthesized from retrieval anchors and retained only when evidence spans are found in the source chunk; each record stores itschunk\_idfor provenance\-based ChunkHit@3\. Query and ground\-truth construction details are in[Section˜C\.6](https://arxiv.org/html/2606.15335#A3.SS6.SSS0.Px2); retrieval architecture details are in[Section˜C\.6](https://arxiv.org/html/2606.15335#A3.SS6)\. We evaluate on single\-round sharing as our main setting; the multi\-turn deployment extension is discussed in[Section˜5\.6](https://arxiv.org/html/2606.15335#S5.SS6)\.

##### Training configuration\.

Training runs for 12 rounds with 300 local steps per round per agent, batch size 4, and learning rate2×10−42\\times 10^\{\-4\}\. We set the proximal regularization strengthν=0\.1\\nu\{=\}0\.1to mitigate drift under non\-IID data\. Unless otherwise stated, we useλadv=1\.0\\lambda\_\{\\text\{adv\}\}\{=\}1\.0, GRL strengthγ=0\.5\\gamma\{=\}0\.5,λorth=0\.2\\lambda\_\{\\text\{orth\}\}\{=\}0\.2, and prototype noise scaleσnoise=0\.01\\sigma\_\{\\text\{noise\}\}\{=\}0\.01\(convergence analysis in[Section˜D\.4\.1](https://arxiv.org/html/2606.15335#A4.SS4.SSS1)\)\. Full hyperparameters and schedules are deferred to[Section˜C\.5](https://arxiv.org/html/2606.15335#A3.SS5)\.

### 5\.2Baselines

We compare against practical alternatives for privacy\-preserving text sharing:

Placeholder\-only:This approach first applies a PII detection model to identify sensitive spans \(names, dates, addresses, account numbers, etc\.\), then replaces each detected span with a type\-specific placeholder token \(e\.g\.,\[NAME\],\[DATE\],\[ADDRESS\],\[ACCOUNT\]\)\. We evaluate three PII detectors:gliner\-pii\-large\-v1\.0Zaratiana et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib45)\), a generalist NER model;piiranha\-v1\-detect\-personal\-information, a DeBERTa\-based modelHe et al\. \([2020](https://arxiv.org/html/2606.15335#bib.bib12)\)fine\-tuned for PII detection; anddeberta\-pii\-finetuned, another DeBERTa variant trained on PII corpora\. While placeholder replacement removes explicit identifiers, it does not address implicit stylistic fingerprints; furthermore, opaque placeholder tokens can degrade downstream RAG by collapsing task\-relevant spans into generic tokens and weakening answer grounding, provenance, and cross\-document aggregation\.

LLM paraphrasing:Locally paraphrase text with open\-source LLMs using a privacy\-focused prompt, then share the rewritten text\.

Policy gating:Adapted from dynamic access\-control memory sharingRezazadeh et al\. \([2025](https://arxiv.org/html/2606.15335#bib.bib34)\), a local policy model \(Qwen2\.5\-7B\) decides per chunk whether to share the original text, provide a summary, or refuse sharing, based on the requester’s agent tag and the helper’s data sensitivity level\.

Prompts and policy templates are in[Section˜C\.7](https://arxiv.org/html/2606.15335#A3.SS7)\.

### 5\.3Evaluation Metrics

We evaluate sanitization quality using privacy and utility metrics\.

##### Privacy metrics\.

\(i\) Avg\. PII: the average number of PII spans detected in sanitized chunks by an external detector \(lower is better\)\.\(ii\) Ans\. Rate: the fraction of final answers that contain at least one exposed PII entity\. We instantiate the external detector asgliner\-pii\-large\-v1\.0with a 0\.3 confidence threshold over common PII labels; implementation details are in[Section˜C\.6](https://arxiv.org/html/2606.15335#A3.SS6.SSS0.Px5)\.\(iii\) Distributional fingerprint leakage: 7\-way attribution accuracy/macro\-F1 from learned role embeddings \(EXP\-1\) and sanitized text via stylometry \(EXP\-3\)\.\(iv\) Prototype fingerprint leakage: 7\-way attribution accuracy/macro\-F1 from uploaded prototypes \(EXP\-2\)\. Since capability tags and participation are public, these probes do not test whether the helper capability tag is hidden\. They test whether transmitted text or learned artifacts still carry residual source\-correlated fingerprints beyond that public information; full protocols are in[Section˜C\.1](https://arxiv.org/html/2606.15335#A3.SS1)\.

##### Utility metrics\.

\(i\) F1/Prec\./Rec\./Cos\.: token\-level F1, precision, recall, and cosine similarity between generated answers and ground\-truth answers \(bag\-of\-words TF representation; see[Section˜C\.6](https://arxiv.org/html/2606.15335#A3.SS6)\)\.\(ii\) Faithfulness: the fraction of stopword\-removed content words in the answer that appear in retrieved evidence\.\(iii\) ChunkHit@3: the fraction of ground\-truth chunks appearing in top\-3 retrieved results\.

### 5\.4Main Results

DiSanachieves strong privacy protection with modest utility loss\. As shown in[Table˜2](https://arxiv.org/html/2606.15335#S4.T2), it reduces answer\-level PII exposure from 11\.8% under unprotected sharing to just 0\.6%, while preserving answer faithfulness at 83\.17% close to the 86\.10% unprotected baseline\.

[Figure˜3](https://arxiv.org/html/2606.15335#S5.F3)further shows thatDiSanoffers the most favorable privacy–utility trade\-off among the evaluated sanitizers, being the only method that simultaneously achieves sub\-1% answer\-level PII leakage and near\-baseline task performance\.

![Refer to caption](https://arxiv.org/html/2606.15335v1/x9.png)

Figure 3:Privacy–utility trade\-off\. The ideal region \(green, upper\-right\) represents high utility with strong privacy\. Among evaluated methods,DiSanachieves the most favorable trade\-off on this benchmark\.
### 5\.5Ablation Study

We ablate two key components:*style isolation*and*prototype alignment*\. Full results are in[Section˜C\.8](https://arxiv.org/html/2606.15335#A3.SS8)\.

Table 3:Ablation results \(single\-round\)\.Bold= best column value;underline= second best\. These are not competing sanitizers: A1 and A2 relax privacy constraints, so their higher utility metrics reflect under\-sanitization rather than improvement\. The relevant comparison is privacy cost vs\. utility gain\.##### Style isolation ablations\.

We test:\(A1\)removingℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}\(λorth=0\\lambda\_\{\\text\{orth\}\}\{=\}0\), allowing role embeddings to absorb stylometric cues; and\(A2\)high\-α\\alphafusion \(α=0\.9\\alpha\{=\}0\.9\), amplifying style influence during decoding\. A1 and A2 show slightly higher RAG utility metrics because weaker style isolation allows more source\-specific lexical content to pass through the decoder, surfacing additional matching tokens in the answer\. This is precisely the privacy failure mechanism: the same leaked content that boosts surface utility exposes source\-identifying information\. A1 increases Avg\. PII by7\.3×7\.3\\timesand A2 by9\.3×9\.3\\times, with answer\-level exposure rising to 1\.4% and 2\.2% respectively\. The appendix confirms the effect at the representation level: role attribution F1 rises from 0\.13 \(DiSan\) to 0\.18 \(A1\) and 0\.24 \(A2\)\. These ablations reveal the privacy–utility trade\-off: relaxing style isolation shifts the operating point toward higher utility at the cost of substantially weaker privacy\.DiSanis selected as the operating point that maximizes privacy while incurring only modest utility loss\.

##### Prototype alignment ablation\.

We remove all prototype components: no EMA prototypes, no global aggregation, andλp=0\\lambda\_\{p\}\{=\}0\(removingℒproto\\mathcal\{L\}\_\{\\text\{proto\}\}from[Equation˜6](https://arxiv.org/html/2606.15335#S4.E6)\)\. Without prototype anchors, role spaces drift under non\-IID training, degrading both utility and privacy: cosine similarity drops 6%, ChunkHit@3 drops 3\.6pp, and PII exposure increases 2\.7×\\times\.

##### Disentanglement verification\.

Beyond privacy metrics, we verify that the model learns the intended decomposition\. As diagnostic evidence, role embeddings used for sharing are near\-random under the 7\-way fingerprint probe \(F1==0\.05\), while local\-only style embeddings remain strongly source\-correlated \(F1==0\.84\)\.[Figure˜5](https://arxiv.org/html/2606.15335#A3.F5)further confirms successful role–style separation: cosine similarity between role and style embeddings clusters near zero across all agents, and a t\-SNE projection shows clear geometric separation between the two subspaces\. This indicates thatℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}concentrates source\-correlated variation primarily in the local\-only style subspace\. DiSan does not provide formal differential\-privacy guarantees; full attack protocols across three surfaces are in[Section˜C\.1](https://arxiv.org/html/2606.15335#A3.SS1)\.

Table 4:Enron stylometry probe \(7 authors\)\. Lower F1 indicates weaker residual distributional fingerprints\.[Table˜4](https://arxiv.org/html/2606.15335#S5.T4)treats Enron as a stylometry diagnostic rather than hidden\-tag evaluation;DiSanlowers TF\-IDF/BERT attribution to 0\.221/0\.203, near random and far below GLiNER\. Appendix[C\.1](https://arxiv.org/html/2606.15335#A3.SS1)further compares with dedicated authorship\-obfuscation baselines, JAMDEC and StyleRemix, which target stylometric leakage rather than RAG utility and yield substantially smaller reductions\.

We adopt LoRAHu et al\. \([2022](https://arxiv.org/html/2606.15335#bib.bib13)\)for efficient training \(details in[Sections˜B\.3](https://arxiv.org/html/2606.15335#A2.SS3)and[B\.4](https://arxiv.org/html/2606.15335#A2.SS4)\)\.

### 5\.6Multi\-Party Collaboration Analysis

##### Multi\-turn RAG\.

The single\-round protocol extends naturally to multi\-turn settings where the requester refines queries based on accumulated evidence\. At each turntt, the requester issues a refined queryqtq\_\{t\}to a possibly different helper subset; each helper sanitizes new evidence independently before transmitting it\. The key property is preserved: only sanitized text crosses the privacy boundary at every turn\. We treat this as a deployment extension of the same sanitization interface and leave dedicated multi\-turn benchmarking to future work\.

##### Case study: cross\-organizational IPO analysis\.

[Figure˜6](https://arxiv.org/html/2606.15335#A3.F6)\(see[Section˜C\.9](https://arxiv.org/html/2606.15335#A3.SS9)\) illustrates a realistic multi\-party scenario where Lumina Capital’s Audit\-Core agent evaluates a company for IPO eligibility but lacks sufficient external benchmarks\. Audit\-Core queries partner agents via a data broker; each external agent sanitizes its response withDiSanbefore transmission, removing entity names, geographic identifiers, and organizational details while preserving task\-relevant financial metrics\. This scenario instantiates the same agent setting used in our benchmark: routing is based on public capability tags, while each helper keeps its repository and retrieval index local\. The privacy boundary is therefore the transmitted sanitized snippet, not the helper identity or the fact of collaboration\. The final answer aggregates benchmark figures without exposing any source\-identifying information, enabling accurate cross\-party assessment across organizational boundaries\.

## 6Conclusion

Identifier\-level anonymization is insufficient for source\-invariant text sharing because private organizational information is often a distributional property of text rather than a set of localized identifiers\.DiSanaddresses this by enforcing role–style orthogonality and federated role alignment, producing sanitized text that preserves task semantics while suppressing source\-identifying patterns\. Across distributed\-agent RAG and Enron stylometry evaluations, the results show that representation\-level disentanglement provides a practical path toward safer text sharing across distributed agents\. Future work should study stronger adaptive adversaries, repeated\-query settings, and broader cross\-domain deployments\.

## Limitations

##### Dataset\.

Our main experiments use a synthetic finance corpus with annotated PII spans\. The use of synthetic data is a necessary constraint rather than a methodological choice: datasets containing real PII cannot legally or ethically be used for research publication, and this is standard practice in privacy\-preserving NLP\. Among publicly available PII\-annotated corpora, most are unsuitable for RAG evaluation due to very short texts \(social media, medical notes\) or insufficient document\-type diversity for multi\-party simulation; the selected corpus provides the document length, PII annotation quality, and domain variety required by our setup\. RAG evaluation queries are synthesized on top of this validated base rather than generating both PII data and queries from scratch, which would compound synthesis risk across two stages\. External validation on the Enron email corpus, with real stylistic variation outside the finance domain, partially addresses domain generalizability\.

##### Formal privacy guarantees\.

DiSandoes not provide formal differential privacy guarantees\. Applying DP to sequence\-to\-sequence generation requires per\-token noise calibration incompatible with coherent text generation; we instead rely on empirical validation across three attack surfaces\. Formal privacy analysis for generative sanitizers is an important direction for future work\.

## References

- Abadi et al\. \(2016\)Martin Abadi, Andy Chu, Ian Goodfellow, H\. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang\. 2016\.[Deep learning with differential privacy](https://doi.org/10.1145/2976749.2978318)\.In*Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security*, CCS ’16, pages 308–318, New York, NY, USA\. Association for Computing Machinery\.
- Bengio et al\. \(2013\)Yoshua Bengio, Aaron Courville, and Pascal Vincent\. 2013\.Representation learning: A review and new perspectives\.*IEEE transactions on pattern analysis and machine intelligence*, 35\(8\):1798–1828\.
- Canonne et al\. \(2020\)Clément L Canonne, Gautam Kamath, and Thomas Steinke\. 2020\.[The discrete gaussian for differential privacy](https://proceedings.neurips.cc/paper_files/paper/2020/file/b53b3a3d6ab90ce0268229151c9bde11-Paper.pdf)\.In*Advances in Neural Information Processing Systems*, volume 33, pages 15676–15688\. Curran Associates, Inc\.
- Chakraborty et al\. \(2025\)Abhijit Chakraborty, Chahana Dahal, and Vivek Gupta\. 2025\.[Federated retrieval\-augmented generation: A systematic mapping study](https://arxiv.org/abs/2505.18906)\.*Preprint*, arXiv:2505\.18906\.
- Cover and Thomas \(2006\)Thomas M\. Cover and Joy A\. Thomas\. 2006\.*Elements of Information Theory*\.Wiley\-Interscience\.
- Dwork et al\. \(2006\)Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith\. 2006\.Calibrating noise to sensitivity in private data analysis\.In*Theory of Cryptography*, pages 265–284, Berlin, Heidelberg\. Springer Berlin Heidelberg\.
- Feldman et al\. \(2020\)Vitaly Feldman, Tomer Koren, and Kunal Talwar\. 2020\.[Private stochastic convex optimization: optimal rates in linear time](https://doi.org/10.1145/3357713.3384335)\.In*Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing*, STOC 2020, pages 439–449, New York, NY, USA\. Association for Computing Machinery\.
- Fisher et al\. \(2024a\)Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell L Gordon, Zaid Harchaoui, and Yejin Choi\. 2024a\.[StyleRemix: Interpretable authorship obfuscation via distillation and perturbation of style elements](https://doi.org/10.18653/v1/2024.emnlp-main.241)\.In*Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing*, pages 4172–4206, Miami, Florida, USA\. Association for Computational Linguistics\.
- Fisher et al\. \(2024b\)Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui, and Yejin Choi\. 2024b\.[JAMDEC: Unsupervised authorship obfuscation using constrained decoding over small language models](https://doi.org/10.18653/v1/2024.naacl-long.87)\.In*Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\)*, pages 1552–1581, Mexico City, Mexico\. Association for Computational Linguistics\.
- Han et al\. \(2024\)Jingxuan Han, Quan Wang, Zikang Guo, Benfeng Xu, Licheng Zhang, and Zhendong Mao\. 2024\.Disentangled learning with synthetic parallel data for text style transfer\.In*Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 15187–15201\.
- He et al\. \(2025\)Hangyu He, Xin Yuan, Kai Wu, Ren Ping Liu, and Wei Ni\. 2025\.pfedrag: A personalized federated retrieval\-augmented generation system with depth\-adaptive tiered embedding tuning\.In*Findings of the Association for Computational Linguistics: EMNLP 2025*, pages 14255–14268\.
- He et al\. \(2020\)Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen\. 2020\.Deberta: Decoding\-enhanced bert with disentangled attention\.*arXiv preprint arXiv:2006\.03654*\.
- Hu et al\. \(2022\)Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen\-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others\. 2022\.Lora: Low\-rank adaptation of large language models\.*ICLR*, 1\(2\):3\.
- John et al\. \(2019\)Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Vechtomova\. 2019\.Disentangled representation learning for non\-parallel text style transfer\.In*Proceedings of the 57th annual meeting of the association for computational linguistics*, pages 424–434\.
- Karimireddy et al\. \(2020\)Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh\. 2020\.[SCAFFOLD: Stochastic controlled averaging for federated learning](https://proceedings.mlr.press/v119/karimireddy20a.html)\.In*Proceedings of the 37th International Conference on Machine Learning*, volume 119 of*Proceedings of Machine Learning Research*, pages 5132–5143\. PMLR\.
- Klimt and Yang \(2004\)Bryan Klimt and Yiming Yang\. 2004\.The enron corpus: A new dataset for email classification research\.In*European conference on machine learning*, pages 217–226\. Springer\.
- Lewis et al\. \(2020\)Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen\-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela\. 2020\.[Retrieval\-augmented generation for knowledge\-intensive nlp tasks](https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf)\.In*Advances in Neural Information Processing Systems*, volume 33, pages 9459–9474\. Curran Associates, Inc\.
- Li et al\. \(2020a\)Li Li, Yuxi Fan, Mike Tse, and Kuo\-Yi Lin\. 2020a\.[A review of applications in federated learning](https://doi.org/10.1016/j.cie.2020.106854)\.*Computers & Industrial Engineering*, 149:106854\.
- Li et al\. \(2021a\)Q\. Li, Yiqun Diao, Quan Chen, and Bingsheng He\. 2021a\.[Federated learning on non\-iid data silos: An experimental study](https://api.semanticscholar.org/CorpusID:231786564)\.*2022 IEEE 38th International Conference on Data Engineering \(ICDE\)*, pages 965–978\.
- Li et al\. \(2021b\)Qinbin Li, Bingsheng He, and Dawn Song\. 2021b\.Model\-contrastive federated learning\.In*Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition \(CVPR\)*, pages 10713–10722\.
- Li et al\. \(2020b\)Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith\. 2020b\.Federated optimization in heterogeneous networks\.*Proceedings of Machine learning and systems*, 2:429–450\.
- Liu et al\. \(2022\)Dianqi Liu, Liang Bai, Tianyuan Yu, and Aiming Zhang\. 2022\.[Towards method of horizontal federated learning: A survey](https://doi.org/10.1109/BigDIA56350.2022.9874186)\.In*2022 8th International Conference on Big Data and Information Analytics \(BigDIA\)*, pages 259–266\.
- Liu et al\. \(2024\)Yi Liu, Xiangyu Liu, Xiangrong Zhu, and Wei Hu\. 2024\.Multi\-aspect controllable text generation with disentangled counterfactual augmentation\.In*Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 9231–9253\.
- Malik and Dustdar \(2011\)Ahmad Kamran Malik and Schahram Dustdar\. 2011\.[Enhanced sharing and privacy in distributed information sharing environments](https://api.semanticscholar.org/CorpusID:14389663)\.*2011 7th International Conference on Information Assurance and Security \(IAS\)*, pages 286–291\.
- Mao et al\. \(2025\)Qianren Mao, Qili Zhang, Hanwen Hao, Zhentao Han, Runhua Xu, Weifeng Jiang, Qi Hu, Zhijun Chen, Tyler Zhou, Bo Li, and 1 others\. 2025\.Privacy\-preserving federated embedding learning for localized retrieval\-augmented generation\.*arXiv preprint arXiv:2504\.19101*\.
- McMahan et al\. \(2017\)Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas\. 2017\.[Communication\-Efficient Learning of Deep Networks from Decentralized Data](https://proceedings.mlr.press/v54/mcmahan17a.html)\.In*Proceedings of the 20th International Conference on Artificial Intelligence and Statistics*, volume 54 of*Proceedings of Machine Learning Research*, pages 1273–1282\. PMLR\.
- Meisenbacher and Matthes \(2024\)Stephen Meisenbacher and Florian Matthes\. 2024\.Just rewrite it again: A post\-processing method for enhanced semantic similarity and privacy preservation of differentially private rewritten text\.In*Proceedings of the 19th International Conference on Availability, Reliability and Security*, ARES ’24\. Association for Computing Machinery\.
- Minaee et al\. \(2024\)Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao\. 2024\.Large language models: A survey\.*arXiv preprint arXiv:2402\.06196*\.
- Ouadrhiri and Abdelhadi \(2022\)Ahmed El Ouadrhiri and Ahmed Abdelhadi\. 2022\.[Differential privacy for deep and federated learning: A survey](https://doi.org/10.1109/ACCESS.2022.3151670)\.*IEEE Access*, 10:22359–22380\.
- Qian et al\. \(2025\)Cheng Qian, Hainan Zhang, Yongxin Tong, Hong\-Wei Zheng, and Zhiming Zheng\. 2025\.Hyfedrag: A federated retrieval\-augmented generation framework for heterogeneous and privacy\-sensitive data\.*arXiv preprint arXiv:2509\.06444*\.
- Qian et al\. \(2019\)Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, and Mark Hasegawa\-Johnson\. 2019\.Autovc: Zero\-shot voice style transfer with only autoencoder loss\.In*International Conference on Machine Learning*, pages 5210–5219\. PMLR\.
- Raff and Sylvester \(2018\)Edward Raff and Jared Sylvester\. 2018\.Gradient reversal against discrimination: A fair neural network learning approach\.In*2018 IEEE 5th International Conference on Data Science and Advanced Analytics \(DSAA\)*, pages 189–198\. IEEE\.
- Reimers and Gurevych \(2019\)Nils Reimers and Iryna Gurevych\. 2019\.Sentence\-bert: Sentence embeddings using siamese bert\-networks\.In*Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\)*, pages 3982–3992\.
- Rezazadeh et al\. \(2025\)Alireza Rezazadeh, Zichao Li, Ange Lou, Yuying Zhao, Wei Wei, and Yujia Bao\. 2025\.Collaborative memory: Multi\-user memory sharing in llm agents with dynamic access control\.ArXiv preprint arXiv:2505\.18279\.
- Shi et al\. \(2025\)Zitong Shi, Guancheng Wan, Wenke Huang, Guibin Zhang, Jiawei Shao, Mang Ye, and Carl Yang\. 2025\.Privacy\-enhancing paradigms within federated multi\-agent systems\.*arXiv preprint arXiv:2503\.08175*\.
- Stamatatos \(2009\)Efstathios Stamatatos\. 2009\.A survey of modern authorship attribution methods\.*Journal of the American Society for information Science and Technology*, 60\(3\):538–556\.
- Stubbs et al\. \(2015\)Amber Stubbs, Christopher Kotfila, and Özlem Uzuner\. 2015\.Automated systems for the de\-identification of longitudinal clinical narratives: Overview of 2014 i2b2/uthealth shared task track 1\.*Journal of biomedical informatics*, 58:S11–S19\.
- Tan et al\. \(2021\)Yue Tan, Guodong Long, Lu Liu, Tianyi Zhou, Qinghua Lu, Jing Jiang, and Chengqi Zhang\. 2021\.[Fedproto: Federated prototype learning across heterogeneous clients](https://api.semanticscholar.org/CorpusID:247292268)\.In*AAAI Conference on Artificial Intelligence*\.
- Wang et al\. \(2020\)Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou\. 2020\.MiniLM: Deep self\-attention distillation for task\-agnostic compression of pre\-trained transformers\.In*Proceedings of the 34th International Conference on Neural Information Processing Systems*, NeurIPS 2020\.
- Watson et al\. \(2024\)Alex Watson, Yev Meyer, Maarten Van Segbroeck, Matthew Grossman, Sami Torbey, Piotr Mlocek, and Johnny Greco\. 2024\.Synthetic\-PII\-Financial\-Documents\-North\-America: A synthetic dataset for training language models to label and detect pii in domain specific formats\.Hugging Face dataset\.[https://huggingface\.co/datasets/gretelai/synthetic\_pii\_finance\_multilingual](https://huggingface.co/datasets/gretelai/synthetic_pii_finance_multilingual)\.
- Wu et al\. \(2024\)Feijie Wu, Zitao Li, Yaliang Li, Bolin Ding, and Jing Gao\. 2024\.[Fedbiot: Llm local fine\-tuning in federated learning without full model](https://doi.org/10.1145/3637528.3671897)\.In*Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining*, KDD ’24, pages 3345–3355, New York, NY, USA\. Association for Computing Machinery\.
- Wu et al\. \(2021\)Yawen Wu, Dewen Zeng, Zhepeng Wang, Yiyu Shi, and Jingtong Hu\. 2021\.Federated contrastive learning for volumetric medical image segmentation\.In*Medical Image Computing and Computer Assisted Intervention – MICCAI 2021*, pages 367–377, Cham\. Springer International Publishing\.
- Xiao et al\. \(2024\)Yijia Xiao, Yiqiao Jin, Yushi Bai, Yue Wu, Xianjun Yang, Xiao Luo, Wenchao Yu, Xujiang Zhao, Yanchi Liu, Quanquan Gu, and 1 others\. 2024\.Large language models can be contextual privacy protection learners\.In*Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing*, pages 14179–14201\.
- Xie et al\. \(2024\)Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, and Sergey Yekhanin\. 2024\.Differentially private synthetic data via foundation model APIs 2: Text\.In*Proceedings of the 41st International Conference on Machine Learning*, ICML’24\.
- Zaratiana et al\. \(2024\)Urchade Zaratiana, Nadi Tomeh, Pierre Holat, and Thierry Charnois\. 2024\.[GLiNER: Generalist model for named entity recognition using bidirectional transformer](https://doi.org/10.18653/v1/2024.naacl-long.300)\.In*Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\)*, pages 5364–5376, Mexico City, Mexico\. Association for Computational Linguistics\.
- Zhang et al\. \(2024\)Jianqing Zhang, Yang Liu, Yang Hua, and Jian Cao\. 2024\.[Fedtgp: Trainable global prototypes with adaptive\-margin\-enhanced contrastive learning for data and model heterogeneity in federated learning](https://doi.org/10.1609/aaai.v38i15.29617)\.*Proceedings of the AAAI Conference on Artificial Intelligence*, 38\(15\):16768–16776\.
- Zhang et al\. \(2022\)Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, and Chao Wu\. 2022\.Dense: data\-free one\-shot federated learning\.In*Proceedings of the 36th International Conference on Neural Information Processing Systems*, pages 21414–21428\.

## Appendix AThreat Model Details

This section expands the threat scope summarized in[Section˜3\.2](https://arxiv.org/html/2606.15335#S3.SS2)\.

### A\.1Adversary Roles

We distinguish the primary adversary in the application stage from a narrower diagnostic in the training stage\. In both cases, capability tags and participation are public; privacy is defined as leakage*beyond*these public facts\.

Table 5:Two\-stage threat scope\. Capability tags and agent participation are public in both stages\.##### Primary recipient adversary\.

A collaborating agent receives sanitized textd~\\tilde\{d\}together with public routing metadata, including the helper’s capability tag\. The recipient may try to recover private information from the sanitized content, including explicit PII, stylistic fingerprints tied to the source, organizational document conventions, or other evidence about the helper’s private repository beyond the public tag used for routing\. This is the primary adversary addressed by the role and style disentanglement inDiSan\.

##### Illustrative application scenario\.

In the IPO analysis case study \([Section˜C\.9](https://arxiv.org/html/2606.15335#A3.SS9)\), Lumina Capital’s Audit\-Core agent lacks sufficient external evidence and routes subqueries to helper agents using public capability tags such as CorporateBank, MarketForecaster, and AssetManager\. The tag itself is not private, since Audit\-Core already knows that a CorporateBank helper is being queried for credit risk evidence\. The privacy risk is that the returned sanitized snippet may reveal information beyond this public tag\. Examples include residual company names, account or location identifiers, proprietary report templates, recurring credit assessment language, sector taxonomies, and other organizational fingerprints\. Such leakage could allow the requester to infer private properties of the helper’s repository or internal business process even without seeing the raw document\. Our evaluations match these surfaces\. PII metrics measure explicit identifier leakage in sanitized outputs and final answers\. Stylometric attribution on sanitized text measures distributional fingerprints tied to the source\. Representation and prototype attribution probes diagnose whether the learned role space and uploaded prototypes retain signals associated with the source that could support inference beyond public tags\. We do not claim protection against arbitrary reconstruction of a helper’s full private corpus\.

##### Training stage prototype observer\.

During federated training, the coordinator observes model updates and uploaded role prototypes but*not*raw text or full local embeddings\. We do not aim to hide which public agent participates in training from the coordinator\. Instead, EXP\-2 asks whether uploaded prototypes become persistent source fingerprints that reveal private distributional properties of an agent’s repository beyond its public tag\. Because agents hold non\-IID corpora, their prototypes can reflect distinctive document type mixtures, role and entity frequencies, and business process patterns\. For example, a CorporateBank agent centered on counterparty risk and credit assessments may induce different role space directions than a MarketForecaster agent centered on equity forecasts and price targets\. If such directions remain stable across rounds, an observer can link training artifacts over time or across tasks and infer how an agent’s private corpus is organized, without seeing raw text\.DiSanaddresses this through adversarial training on prototypes, expressed byℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}, and Gaussian noise perturbation withσ=0\.01\\sigma\{=\}0\.01before upload\. Together, these mechanisms suppress the directional signatures evaluated in EXP\-2 \([Section˜C\.1](https://arxiv.org/html/2606.15335#A3.SS1)\)\.

##### Out\-of\-scope adversaries\.

We do not address active adversaries \(malicious servers, poisoning, prompt injection\) or collusion between multiple recipients and the coordinator\. These represent important directions for future work but are orthogonal to the sanitization objective studied here\.

### A\.2Validation Scope

[Section˜3\.2](https://arxiv.org/html/2606.15335#S3.SS2)summarizes the validation goals in the main text\. The detailed protocols are reported in[Sections˜5\.4](https://arxiv.org/html/2606.15335#S5.SS4)and[C\.1](https://arxiv.org/html/2606.15335#A3.SS1): PII metrics evaluate explicit identifier leakage, stylometry evaluates source signatures in transmitted text, and embedding/prototype attribution probes diagnose whether learned sharing artifacts carry persistent source information\.

## Appendix BAdditional Method Details

This appendix collects components omitted from the main paper for space, including \(i\) fusion/residual design, \(ii\) expanded prototype objectives and prototype\-level adversarial training, and \(iii\) the federated training procedure\.

### B\.1Fusion and Residual Stabilization

After fusing the role and style streams via𝐇fused=g​\(\[𝐙r;𝐙s\]\)\\mathbf\{H\}\_\{\\text\{fused\}\}=g\(\[\\mathbf\{Z\}\_\{r\};\\mathbf\{Z\}\_\{s\}\]\), we apply a residual path to stabilize training:

𝐇out=α⋅𝐇fused\+\(1−α\)⋅𝐇,\\mathbf\{H\}\_\{\\text\{out\}\}=\\alpha\\cdot\\mathbf\{H\}\_\{\\text\{fused\}\}\+\(1\-\\alpha\)\\cdot\\mathbf\{H\},\(8\)whereα=σ​\(a\)∈\(0,1\)\\alpha=\\sigma\(a\)\\in\(0,1\)is a learnable scalar gate initialized so thatα\\alphais small early in training\.

For completeness, we provide the gradient decomposition induced by this residual gate\. Letθenc\\theta\_\{\\text\{enc\}\}denote the parameters of the pretrained encoder backbone \(the component that produces𝐇\\mathbf\{H\}from input tokens\)\. Note thatθenc\\theta\_\{\\text\{enc\}\}*excludes*the fusion layergg, gating parameters\{a,α\}\\\{a,\\alpha\\\}, and projection heads𝐖r,𝐖s\\mathbf\{W\}\_\{r\},\\mathbf\{W\}\_\{s\}\. Under this convention:

∂ℒ∂θenc=∂ℒ∂𝐇out​\[α​∂𝐇fused∂𝐇\+\(1−α\)​𝐈\]​∂𝐇∂θenc\.\\frac\{\\partial\\mathcal\{L\}\}\{\\partial\\theta\_\{\\text\{enc\}\}\}=\\frac\{\\partial\\mathcal\{L\}\}\{\\partial\\mathbf\{H\}\_\{\\text\{out\}\}\}\\left\[\\alpha\\frac\{\\partial\\mathbf\{H\}\_\{\\text\{fused\}\}\}\{\\partial\\mathbf\{H\}\}\+\(1\-\\alpha\)\\mathbf\{I\}\\right\]\\frac\{\\partial\\mathbf\{H\}\}\{\\partial\\theta\_\{\\text\{enc\}\}\}\.\(9\)The residual path helps maintain stable gradients early in training when the role/style streams are still adapting\.

### B\.2Expanded Prototype Objectives

##### Batch role centroid\.

For each typek∈𝒦k\\in\\mathcal\{K\}, agentcccomputes the batch role centroid as a token\-level average:

𝐳¯r,k\(c\)=1\|Ωk\(c\)\|​∑\(i,t\)∈Ωk\(c\)𝐳r,t\(i\),Ωk\(c\)=\{\(i,t\):ℓi,t=k\}\.\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}=\\frac\{1\}\{\|\\Omega\_\{k\}^\{\(c\)\}\|\}\\sum\_\{\(i,t\)\\in\\Omega\_\{k\}^\{\(c\)\}\}\\mathbf\{z\}^\{\(i\)\}\_\{r,t\},\\quad\\Omega\_\{k\}^\{\(c\)\}=\\\{\(i,t\):\\ell\_\{i,t\}=k\\\}\.\(10\)IfΩk\(c\)=∅\\Omega\_\{k\}^\{\(c\)\}=\\emptyset, we skip typekkin the corresponding loss term\.

##### Decomposed alignment terms\.

The main paper uses the spherical cosine alignment in[Equation˜4](https://arxiv.org/html/2606.15335#S4.E4)\. Equivalently, one can view prototype alignment as matching \(i\) centroid distance, \(ii\) angular direction, and \(iii\) within\-batch dispersion, defined per typek∈𝒦k\\in\\mathcal\{K\}:

ℒalign\\displaystyle\\mathcal\{L\}\_\{\\text\{align\}\}=∑k∈𝒦‖𝐳¯r,k\(c\)−𝝁k∗‖22,\\displaystyle=\\sum\_\{k\\in\\mathcal\{K\}\}\\left\\\|\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\-\\boldsymbol\{\\mu\}^\{\*\}\_\{k\}\\right\\\|\_\{2\}^\{2\},\(11\)ℒcos\\displaystyle\\mathcal\{L\}\_\{\\text\{cos\}\}=∑k∈𝒦\(1−𝐳¯r,k\(c\)⊤​𝝁k∗‖𝐳¯r,k\(c\)‖2⋅‖𝝁k∗‖2\+ϵ\),\\displaystyle=\\sum\_\{k\\in\\mathcal\{K\}\}\\left\(1\-\\frac\{\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\\top\}\\boldsymbol\{\\mu\}^\{\*\}\_\{k\}\}\{\\\|\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\\\|\_\{2\}\\cdot\\\|\\boldsymbol\{\\mu\}^\{\*\}\_\{k\}\\\|\_\{2\}\+\\epsilon\}\\right\),\(12\)ℒvar\\displaystyle\\mathcal\{L\}\_\{\\text\{var\}\}=∑k∈𝒦1\|Ωk\(c\)\|​∑\(i,t\)∈Ωk\(c\)‖𝐳r,t\(i\)−𝐳¯r,k\(c\)‖22,\\displaystyle=\\sum\_\{k\\in\\mathcal\{K\}\}\\frac\{1\}\{\|\\Omega\_\{k\}^\{\(c\)\}\|\}\\sum\_\{\(i,t\)\\in\\Omega\_\{k\}^\{\(c\)\}\}\\left\\\|\\mathbf\{z\}\_\{r,t\}^\{\(i\)\}\-\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\\right\\\|\_\{2\}^\{2\},\(13\)ℒproto\(full\)\\displaystyle\\mathcal\{L\}\_\{\\text\{proto\(full\)\}\}=ℒalign\+ℒcos\+λvar​ℒvar\.\\displaystyle=\\mathcal\{L\}\_\{\\text\{align\}\}\+\\mathcal\{L\}\_\{\\text\{cos\}\}\+\\lambda\_\{\\text\{var\}\}\\,\\mathcal\{L\}\_\{\\text\{var\}\}\.\(14\)HereΩk\(c\)=\{\(i,t\):ℓi,t=k\}\\Omega\_\{k\}^\{\(c\)\}=\\\{\(i,t\):\\ell\_\{i,t\}=k\\\}is the token index set for typekkin agentcc’s batch \([Equation˜10](https://arxiv.org/html/2606.15335#A2.E10)\)\. In practice, the spherical objective in[Equation˜4](https://arxiv.org/html/2606.15335#S4.E4)is a compact alternative that avoids redundant hyperparameters\.

##### EMA prototype update\.

Each agent maintains an EMA prototype for each type:

𝝁k,s\(c\)=β​𝝁k,s−1\(c\)\+\(1−β\)​𝐳¯r,k,s\(c\),\\boldsymbol\{\\mu\}^\{\(c\)\}\_\{k,s\}=\\beta\\,\\boldsymbol\{\\mu\}^\{\(c\)\}\_\{k,s\-1\}\+\(1\-\\beta\)\\,\\bar\{\\mathbf\{z\}\}^\{\(c\)\}\_\{r,k,s\},\(15\)wheressindexes steps andβ∈\(0,1\)\\beta\\in\(0,1\)\.

##### Spherical EMA\.

When using spherical alignment, we maintain the running prototype on the unit sphere:

𝝁^k,s\(c\)=normalize​\(β​𝝁^k,s−1\(c\)\+\(1−β\)​𝐳¯^r,k,s\(c\)\),\\hat\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k,s\}=\\mathrm\{normalize\}\\\!\\left\(\\beta\\,\\hat\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k,s\-1\}\+\(1\-\\beta\)\\,\\hat\{\\bar\{\\mathbf\{z\}\}\}^\{\(c\)\}\_\{r,k,s\}\\right\),\(16\)where𝐳¯^r,k,s\(c\)=normalize​\(𝐳¯r,k,s\(c\)\)\\hat\{\\bar\{\\mathbf\{z\}\}\}^\{\(c\)\}\_\{r,k,s\}=\\mathrm\{normalize\}\(\\bar\{\\mathbf\{z\}\}^\{\(c\)\}\_\{r,k,s\}\)\.

##### Prototype adversarial training\.

As a diagnostic in the training stage, uploaded prototypes may carry persistent distributional signatures tied to individual sources beyond public agent tags\. We therefore train a prototype discriminatorDψD\_\{\\psi\}to predict the source of uploaded prototypes, and train the encoder to*fool*this discriminator using the GRL objective in[Equation˜5](https://arxiv.org/html/2606.15335#S4.E5)\.

To enable gradient flow despite EMA, we use an estimate that preserves gradients:

𝝁¯k\(c\)=\(1−η\)​sg​\(𝝁k\(c\)\)\+η​𝐳¯r,k\(c\),\\bar\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}=\(1\-\\eta\)\\,\\mathrm\{sg\}\(\\boldsymbol\{\\mu\}^\{\(c\)\}\_\{k\}\)\+\\eta\\,\\bar\{\\mathbf\{z\}\}^\{\(c\)\}\_\{r,k\},\(17\)wheresg​\(⋅\)\\mathrm\{sg\}\(\\cdot\)stops gradients andη∈\[0,1\]\\eta\\in\[0,1\]\. Note that𝝁¯k\(c\)\\bar\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}is used*only for backpropagation*and is not uploaded to the server\.

The discriminator is trained on the perturbed prototypes that are uploaded:

ℒdisc=∑c=1C∑k∈𝒦CE​\(Dψ​\(𝝁~k\(c\)\),c\),\\mathcal\{L\}\_\{\\text\{disc\}\}=\\sum\_\{c=1\}^\{C\}\\sum\_\{k\\in\\mathcal\{K\}\}\\mathrm\{CE\}\\\!\\left\(D\_\{\\psi\}\(\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}\),\\,c\\right\),\(18\)where𝝁~k\(c\)\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}is the perturbed prototype uploaded to the server, as defined in[Equation˜21](https://arxiv.org/html/2606.15335#A2.E21)\.

On the client side, we apply GRL to the estimate that preserves gradients and optimize the adversarial loss:

ℒadv=∑k∈𝒦CE​\(Dψ​\(GRLγ​\(𝝁¯k\(c\)\)\),c\),\\mathcal\{L\}\_\{\\text\{adv\}\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\mathrm\{CE\}\\\!\\left\(D\_\{\\psi\}\(\\mathrm\{GRL\}\_\{\\gamma\}\(\\bar\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}\)\),\\,c\\right\),\(19\)whereγ\\gammais the GRL strength used in[Equation˜5](https://arxiv.org/html/2606.15335#S4.E5)\. The adversarial loss uses𝝁¯k\(c\)\\bar\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}to ensure gradient flow, while the discriminatorDψD\_\{\\psi\}is trained on the perturbed uploaded prototypes𝝁~k\(c\)\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}to match the actual attack surface\.

### B\.3Model Architecture Details

##### Backbone\.

DiSanuses LongT5\-TGlobal\-Base as its backbone\. The encoder combines local sliding\-window attention with dynamically constructed global tokens, enabling linear\-complexity modeling of sequences up to 16,384 tokens while preserving long\-range context\. The backbone hidden size is denoteddencd\_\{\\text\{enc\}\}; in our implementation the maximum input length is 1,536 tokens\. The decoder is the LongT5 Transformer decoder and generates sanitized text autoregressively from modified encoder states\. ThusDiSandoes not replace the pretrained decoder with a separate generator; it inserts a role–style bottleneck between the pretrained encoder and decoder\.

##### Projection heads and discriminator\.

LoRA adapters with rank 8 are applied to the attention Q and V projections of the encoder, adding approximately 0\.9M trainable parameters\. Given encoder states𝐇∈ℝT×denc\\mathbf\{H\}\\in\\mathbb\{R\}^\{T\\times d\_\{\\text\{enc\}\}\}, the role and style projection heads are linear layers followed by dropout withp=0\.1p=0\.1that map each token to 256\-dimensional subspaces:𝐖r,𝐖s∈ℝdenc×256\\mathbf\{W\}\_\{r\},\\mathbf\{W\}\_\{s\}\\in\\mathbb\{R\}^\{d\_\{\\text\{enc\}\}\\times 256\}\([Equations˜1](https://arxiv.org/html/2606.15335#S4.E1)and[2](https://arxiv.org/html/2606.15335#S4.E2)\)\. The two streams are concatenated and mapped back todencd\_\{\\text\{enc\}\}by a linear fusion layerg:ℝ512→ℝdencg:\\mathbb\{R\}^\{512\}\\to\\mathbb\{R\}^\{d\_\{\\text\{enc\}\}\}\. We use the projected\-residual variant by default: the original encoder state is first compressed through the same 512\-dimensional bottleneck and projected back todencd\_\{\\text\{enc\}\}, then mixed with the fused representation using a learnable scalar initialized to 0\.5\. This preserves generation stability without passing raw encoder states directly to the decoder\.

##### Adversarial classifiers\.

All source attribution discriminators use the same MLP template, input→128→128→C\\to 128\\to 128\\to C, with ReLU activations and dropout 0\.1 after each hidden layer;C=7C=7in our experiments\. The role discriminator takes mean\-pooled role embeddings with 256 dimensions, and the fused discriminator takes mean\-pooled fused encoder states with dimensiondencd\_\{\\text\{enc\}\}\. Both are trained through a gradient reversal layer to discourage signals tied to individual sources from being encoded in the shareable role stream or in the final decoder input\. The prototype discriminatorDψD\_\{\\psi\}uses the same MLP architecture with a 256\-dimensional input and is trained on uploaded role prototypes; the client\-side adversarial loss applies GRL to a prototype estimate with gradients so that local updates make prototypes less predictive of their source agent\.

### B\.4Federated Training Details

##### Parameter\-efficient fine\-tuning\.

We employ LoRAHu et al\. \([2022](https://arxiv.org/html/2606.15335#bib.bib13)\)for communication\-efficient federated training\. Specifically, we freeze the pretrained encoder backbone and apply low\-rank adapters only to the attention query and value projections \(target modules:q,v\)\. The trainable parameters include: \(i\) LoRA adapters \(≈\\approx0\.9M parameters\); \(ii\) role projection𝐖r\\mathbf\{W\}\_\{r\}and style projection𝐖s\\mathbf\{W\}\_\{s\}; \(iii\) fusion layerg​\(⋅\)g\(\\cdot\)and residual components; \(iv\) adversarial classifiers\. Note that when using LoRA, the gradient decomposition in[Equation˜9](https://arxiv.org/html/2606.15335#A2.E9)flows only through the LoRA\-adapted attention layers \(not the frozen FFN and normalization layers\), while the projection heads𝐖r\\mathbf\{W\}\_\{r\},𝐖s\\mathbf\{W\}\_\{s\}receive full gradients\.

##### Communication cost\.

LoRA substantially reduces communication cost compared to full\-model synchronization\. The pretrained encoder backbone is frozen; only LoRA adapters \(applied to attention Q/V projections\) and the disentanglement heads are trainable\. During each round, agents upload only trainable parameters: LoRA adapters \(≈\\approx0\.9M parameters,≈\\approx5 MB\), role projection head𝐖r\\mathbf\{W\}\_\{r\}, and fusion layergg\. Style projection weights𝐖s\\mathbf\{W\}\_\{s\}remain strictly local and are never transmitted\. In addition, each agent uploads and downloads a set of role prototypes of size\|𝒦\|×dr\|\\mathcal\{K\}\|\\times d\_\{r\}per round \(256\-dimensional centroids for\|𝒦\|\|\\mathcal\{K\}\|entity types\), which is negligible \(≈\\approx1 KB\) compared to model synchronization\.

##### Convergence analysis\.

Our method builds on FedProxLi et al\. \([2020b](https://arxiv.org/html/2606.15335#bib.bib21)\), which incorporates a proximal termν2​‖θ−θ∗‖2\\frac\{\\nu\}\{2\}\\\|\\theta\-\\theta^\{\*\}\\\|^\{2\}to control client drift under non\-IID data\. Under standard assumptions \(L\-smoothness, bounded gradient variance\), FedProx guarantees convergence to a stationary pointLi et al\. \([2020b](https://arxiv.org/html/2606.15335#bib.bib21)\)\. Our additional loss components \(ℒorth\\mathcal\{L\}\_\{\\text\{orth\}\},ℒproto\\mathcal\{L\}\_\{\\text\{proto\}\}\) are smooth regularizers that preserve this guarantee\. The adversarial componentℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}introduces additional complexity; we rely on empirical validation of stable convergence through training curves and loss monitoring across all federated rounds\.

##### FedProx\.

We add a proximal regularizer during updates:

ℒprox=ν2​‖θ−θ∗‖22\.\\mathcal\{L\}\_\{\\text\{prox\}\}=\\frac\{\\nu\}\{2\}\\left\\\|\\theta\-\\theta^\{\*\}\\right\\\|\_\{2\}^\{2\}\.\(20\)

##### Prototype perturbation\.

Before uploading, agents may applyℓ2\\ell\_\{2\}\-normalization and add Gaussian noise:

𝝁~k\(c\)=normalize​\(𝝁k\(c\)\+𝒩​\(𝟎,σnoise2​𝐈\)\)\.\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}=\\mathrm\{normalize\}\\\!\\left\(\\boldsymbol\{\\mu\}^\{\(c\)\}\_\{k\}\+\\mathcal\{N\}\(\\mathbf\{0\},\\sigma\_\{\\text\{noise\}\}^\{2\}\\mathbf\{I\}\)\\right\)\.\(21\)

##### Pseudocode\.

Algorithm[1](https://arxiv.org/html/2606.15335#alg1)summarizes the end\-to\-end federated training procedure\.

Algorithm 1DiSanTrainingInput:Agents

\{1,…,C\}\\\{1,\\ldots,C\\\}with local data

\{𝒟c\}\\\{\\mathcal\{D\}\_\{c\}\\\}, rounds

RR, local steps

KK
Initialize global model

θ∗\\theta^\{\*\}and prototype discriminator

DψD\_\{\\psi\}
Initialize global prototypes

\{𝝁k∗\}k∈𝒦\\\{\\boldsymbol\{\\mu\}^\{\*\}\_\{k\}\\\}\_\{k\\in\\mathcal\{K\}\}\(e\.g\., zeros\)

for

ρ=1\\rho=1to

RRdo

Broadcast

θ∗\\theta^\{\*\},

DψD\_\{\\psi\}, and

\{𝝁k∗\}k∈𝒦\\\{\\boldsymbol\{\\mu\}^\{\*\}\_\{k\}\\\}\_\{k\\in\\mathcal\{K\}\}to all agents

foragent

c=1c=1to

CCin paralleldo

θ\(c\)←θ∗\\theta^\{\(c\)\}\\leftarrow\\theta^\{\*\}

for

s=1s=1to

KKdo

Sample a batch from

𝒟c\\mathcal\{D\}\_\{c\}
Compute

ℒlocal\(c\)\\mathcal\{L\}^\{\(c\)\}\_\{\\text\{local\}\}\([Equation˜6](https://arxiv.org/html/2606.15335#S4.E6)\)

Update

θ\(c\)\\theta^\{\(c\)\}via gradient descent

Update running prototypes

\{𝝁k\(c\)\}k∈𝒦\\\{\\boldsymbol\{\\mu\}^\{\(c\)\}\_\{k\}\\\}\_\{k\\in\\mathcal\{K\}\}\([Equation˜15](https://arxiv.org/html/2606.15335#A2.E15)or[Equation˜16](https://arxiv.org/html/2606.15335#A2.E16)\)

endfor

Optional perturbation:

𝝁~k\(c\)←perturb​\(𝝁k\(c\)\)​∀k∈𝒦\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}\\leftarrow\\mathrm\{perturb\}\(\\boldsymbol\{\\mu\}^\{\(c\)\}\_\{k\}\)\\ \\forall k\\in\\mathcal\{K\}
Upload

\(θ\(c\),\{𝝁~k\(c\)\}k∈𝒦,\{Nc,k\}k∈𝒦\)\\big\(\\theta^\{\(c\)\},\\\{\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}\\\}\_\{k\\in\\mathcal\{K\}\},\\\{N\_\{c,k\}\\\}\_\{k\\in\\mathcal\{K\}\}\\big\)to server

endfor

θ∗←weighted\_avg​\(\{θ\(c\)\}c=1C\)\\theta^\{\*\}\\leftarrow\\texttt\{weighted\\\_avg\}\(\\\{\\theta^\{\(c\)\}\\\}\_\{c=1\}^\{C\}\)

for

k∈𝒦k\\in\\mathcal\{K\}do

𝝁k∗←∑c=1CNc,k​𝝁~k\(c\)∑c=1CNc,k\\boldsymbol\{\\mu\}^\{\*\}\_\{k\}\\leftarrow\\dfrac\{\\sum\\limits\_\{c=1\}^\{C\}N\_\{c,k\}\\,\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\}\}\{\\sum\\limits\_\{c=1\}^\{C\}N\_\{c,k\}\}

endfor

Train

DψD\_\{\\psi\}on

\{\(𝝁~k\(c\),c\)\}c,k\\\{\(\\tilde\{\\boldsymbol\{\\mu\}\}^\{\(c\)\}\_\{k\},c\)\\\}\_\{c,k\}using[Equation˜18](https://arxiv.org/html/2606.15335#A2.E18)

endfor

Return

θ∗\\theta^\{\*\}

## Appendix CExperimental Supplement

### C\.1Attack Evaluation

We empirically evaluate privacy on three surfaces: sanitized output text, learned representations, and uploaded prototypes, as shown in[Tables˜7](https://arxiv.org/html/2606.15335#A3.T7)and[8](https://arxiv.org/html/2606.15335#A3.T8)\. The primary surface in the application stage is sanitized text received by another agent\. Capability tags and participation are public in our threat model, so the 7\-way probes are not meant to hide the helper capability tag itself\. They serve as diagnostics for whether sanitized text, role representations, or uploaded prototypes still carry residual source\-correlated distributional fingerprints beyond that public information\. Style representations remain strictly local and are reported only as a sanity check that variation identifying agents has been isolated into the local stream\.

##### EXP\-1: Embedding Attribution\.

We employ an SVM classifier with RBF kernel to probe whether embeddings retain source\-correlated fingerprints beyond public tags\. Forrole embeddings, which form the internal stream from which sanitized text and prototypes are derived, the SVM achieves close to random performance, with F1==0\.05 and Acc==0\.18\. This indicates successful privacy protection\. As a disentanglement sanity check,style embeddingsremain strictly local and are never transmitted\. They achieve high accuracy, with Acc==0\.89 and F1==0\.84, confirming thatℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}isolates source\-correlated variation into the local style stream\. This contrast validates the design\. Style carries agent fingerprints but never crosses the privacy boundary, while only sanitized textd~\\tilde\{d\}derived from role representations is transmitted\. Matched distribution tests confirm robustness: role F1 remains at 0\.05 while style F1 stays at 0\.80\.

##### EXP\-2: Prototype Attribution\.

We test whether uploaded role prototypes expose stable fingerprints tied to individual sources beyond public agent tags through two probes\.Sample\-to\-Protopredicts which client prototype is closest to a training sample embedding\.Bootstrap\-Prototrains a probe on local client data and tests it on uploaded prototypes\. Both achieve close to random performance, with Acc==0\.14–0\.16 and F1≈\\approx0\.09\. The 95% confidence intervals include zero, indicating that prototype defenses, including normalization, noise perturbation, and adversarial training, suppress persistent prototype signatures\. Cross\-round linkage attacks train on round\-1 prototypes and test on round\-12 prototypes\. They achieve F1==0\.10, confirming that temporal linkage is also ineffective\.

##### EXP\-3: Text Stylometry\.

We evaluate stylometric leakage in sanitized text using two complementary probes: \(i\) TF\-IDF features \(5000 dimensions, unigrams \+ bigrams\) with classical classifiers \(Logistic Regression, LinearSVC, Random Forest\); and \(ii\) a neural encoder probe \(all\-MiniLM\-L6\-v2\)Reimers and Gurevych \([2019](https://arxiv.org/html/2606.15335#bib.bib33)\); Wang et al\. \([2020](https://arxiv.org/html/2606.15335#bib.bib39)\)capturing deeper contextual patterns beyond surface n\-grams\. These probes use source labels as a diagnostic signal for residual distributional fingerprints, rather than as a claim that public capability tags are hidden\.

EXP\-3a \(Synthetic Finance\):The full 7\-way test shows high F1 \(≈\\approx0\.90\) for both raw and sanitized text, reflecting document\-type confounding rather than stylistic signals\. Controlled binary tests isolating stylometric variation show attribution F1 of 0\.51–0\.56 \(sanitized\) vs\. 0\.54–0\.57 \(raw\), both near the 0\.50 random baseline, suggesting limited intrinsic style variation in the synthetic data\.

EXP\-3b \(Enron Emails, External Validation\):To validate on real\-world data with genuine author variation, we evaluate on the Enron email corpusKlimt and Yang \([2004](https://arxiv.org/html/2606.15335#bib.bib16)\)\(7 authors, 500 emails each; details in[Section˜C\.3](https://arxiv.org/html/2606.15335#A3.SS3)\)\. On raw emails, TF\-IDF achieves 82\.5% F1 and the BERT probe achieves 69\.1% F1 \(vs\. 14\.3% random baseline\)\.

We also run stylometric attribution after placeholder masking with each PII detector from our baselines, to directly test whether token\-level removal resolves distributional leakage\. Even GLiNER, the strongest detector \(masking 19\.2% of tokens\), reduces TF\-IDF attribution to 67\.2% \(18\.6% reduction\) and BERT attribution to 61\.8% \(10\.6% reduction\)\. AfterDiSansanitization, TF\-IDF drops to 22\.1% \(73\.2% reduction\) and the BERT probe drops to 20\.3% \(70\.6% reduction\), both approaching the random baseline\. The gap confirms that distributional signatures persist even after aggressive masking, and that a representation\-level approach is necessary to suppress them\.

EXP\-3c \(Authorship Obfuscation Baselines\):We compare against two recent authorship obfuscation methods on the identical Enron setup: JAMDECFisher et al\. \([2024b](https://arxiv.org/html/2606.15335#bib.bib9)\), which applies constrained decoding over a small language model to suppress author\-specific tokens, and StyleRemixFisher et al\. \([2024a](https://arxiv.org/html/2606.15335#bib.bib8)\), which perturbs fine\-grained style elements via LoRA modules in two modes \(Fixed: uniform style target; Adaptive: per\-author target\)\. Results are in[Table˜6](https://arxiv.org/html/2606.15335#A3.T6)\.

JAMDEC barely moves the attribution needle \(TF\-IDF F1:0\.825→0\.8160\.825\\to 0\.816, 1\.2% reduction\), suggesting that constrained decoding without explicit disentanglement leaves distributional fingerprints intact\. StyleRemix Fixed achieves 11\.5% TF\-IDF reduction but only 3\.4% under the neural BERT probe, indicating that surface n\-gram perturbation does not fully remove contextual attribution cues\. StyleRemix Adaptive*backfires*: by steering each author toward a distinct style target it introduces new per\-author signatures, increasing TF\-IDF F1 to 0\.970 and BERT F1 to 0\.845, both above the raw baseline\.DiSandelivers 73\.2% TF\-IDF reduction and 70\.6% BERT reduction, approximately60×60\\timesgreater than JAMDEC and6×6\\timesgreater than the best StyleRemix variant, by enforcing explicit role–style disentanglement rather than surface\-level rewriting\.

Table 6:EXP\-3c: Authorship obfuscation baselines on Enron Emails \(7\-way\)\.Reduction is relative to the 0\.825 TF\-IDF / 0\.691 BERT raw baseline\. Negative values indicate worse\-than\-raw attribution \(backfire\)\. Random baseline: F1==0\.143\.
##### Attack model scope\.

Our evaluation employs a principled hierarchy: \(i\) classical stylometry \(TF\-IDF \+ linear classifiers\)Stamatatos \([2009](https://arxiv.org/html/2606.15335#bib.bib36)\)for reproducibility; \(ii\) neural encoder probes \(all\-MiniLM\-L6\-v2\) capturing deeper contextual patterns; and \(iii\) dedicated authorship obfuscation baselines \(JAMDEC, StyleRemix\) that directly target distributional fingerprints\. Consistency across all levels \(F1≈\\approx0\.05 for embeddings, 73\.2% stylometric reduction on Enron,6–60×6\\text\{\-\-\}60\\timesgreater reduction than obfuscation baselines\) provides converging evidence of effective privacy protection under the honest\-but\-curious threat model\. Stronger attacks \(LLM\-based attribution, adaptive adversaries with repeated queries\) remain important future directions but exceed typical honest\-but\-curious capabilities\.

Table 7:Attack evaluation: Embedding and Prototype Attribution \(EXP\-1, EXP\-2\)\.Random baseline: Acc==F1==0\.14\.EXP\-1: Embedding Attribution \(SVM, 7\-way\)EmbeddingAccF1Match F1Noterole\(shared\)0\.180\.050\.05Near\-randomstyle\(local\)0\.890\.840\.80Validated
EXP\-2: Prototype AttributionAttackAccF1Boot Acc95% CISample\-to\-Proto0\.160\.09––Bootstrap\-Proto0\.140\.070\.13±\\pm0\.13\[0\.00, 0\.43\]

Table 8:Attack evaluation: Text Stylometry \(EXP\-3a, EXP\-3b\)\.TF\-IDF and BERT probes on synthetic finance and Enron emails\. EXP\-3b also includes placeholder\-masking baselines to directly test whether token\-level removal resolves distributional leakage\. Random baseline: F1==0\.143\.EXP\-3a: Synthetic Finance \(7\-way\)

EXP\-3b: Enron Emails \(7\-way\)

### C\.2Attack Evaluation Details

For all embedding attribution attacks \(EXP\-1\), we use Support Vector Machines \(SVM\) with RBF kernel\.

For prototype attribution \(EXP\-2\), we evaluate two attacks:

- •Sample\-to\-Proto: For each test sample, compute its role embedding centroid and measure cosine similarity to each client’s global prototype; predict the client with highest similarity\.
- •Bootstrap\-Proto \(MLP\): Train an MLP classifier on prototype embeddings from training rounds and evaluate on held\-out rounds, testing cross\-round linkability\.

### C\.3Enron Email Experiment Details

To validate stylometric protection on real\-world data with genuine author variation, we conduct an external evaluation on the Enron email corpusKlimt and Yang \([2004](https://arxiv.org/html/2606.15335#bib.bib16)\)\. This dataset contains approximately 500,000 emails from 150 Enron employees, released during the 2001 federal investigation\. It is widely used as a benchmark for authorship attribution and email classification research\.

##### Data selection and preprocessing\.

We select the top 7 senders by email volume to match the number of agents in our main experiments:kaminski\-v\(20,123 emails\),mann\-k\(16,891\),dasovich\-j\(16,359\),jones\-t\(15,491\),kean\-s\(15,352\),shackleton\-s\(14,076\), andfarmer\-d\(9,869\)\. For each sender, we randomly sample 500 emails \(stratified\), yielding 3,500 total samples\.

Preprocessing steps:

- •Extract email body by removing headers \(To, From, Subject, Date, etc\.\)
- •Remove forwarded message markers and quoted reply sections
- •Filter emails by body length: minimum 100 characters, maximum 2,000 characters

##### Feature extraction\.

We use TF\-IDF vectorization with the following parameters:

- •Maximum features: 5,000
- •N\-gram range: unigrams and bigrams
- •Minimum document frequency: 2
- •Maximum document frequency: 0\.95

##### Classification\.

We evaluate three classifiers commonly used in stylometry research:

- •Logistic Regression: L2 regularization, max iterations = 1,000
- •LinearSVC: Linear kernel SVM, max iterations = 1,000
- •Random Forest: 100 estimators

For stronger attack evaluation, we also employ a pre\-trained transformer encoder \(all\-MiniLM\-L6\-v2\) to generate 384\-dimensional sentence embeddings, followed by SVM\-RBF classification\. This neural probe captures deeper contextual patterns beyond surface\-level n\-grams\.

Data is split into 70% training and 30% test sets using stratified sampling \(random seed = 42\)\. We report the best F1 \(macro\) across all classifiers\.

##### Sanitization\.

Raw emails are processed through the trainedDiSanmodel using greedy decoding \(beam size = 1\) with the task prefix‘‘deidentify:’’\. Both TF\-IDF and BERT pipelines are then applied to the sanitized outputs to evaluate residual stylometric leakage\.

##### Results interpretation\.

The 82\.5% F1 on raw emails versus 14\.3% random baseline confirms that Enron emails exhibit strong, distinguishable stylistic patterns across senders, unlike our synthetic finance dataset where controlled tests showed near\-random attribution even on raw text\.

Placeholder masking offers diminishing returns: even GLiNER, the strongest detector at 19\.2% token removal, reduces TF\-IDF F1 to only 67\.2% \(18\.6% reduction\)\. This directly demonstrates that source\-identifying signals are distributed across the text rather than concentrated in explicit identifiers\.

DiSanreduces TF\-IDF F1 to 22\.1% \(73\.2% reduction,0\.825→0\.2210\.825\\to 0\.221\), nearly 4×\\timesthe best masking baseline, demonstrating effective removal of author\-identifying patterns when they exist in the source data\.

The BERT probe \(SVM\-RBF on transformer embeddings\) yields lower raw F1 \(69\.1%\) than TF\-IDF, suggesting that Enron stylometry relies more on surface\-level n\-gram patterns than deep semantic structure\. Nevertheless,DiSanstill achieves 70\.6% reduction \(0\.691→0\.2030\.691\\to 0\.203\), showing that the reduction also holds under a neural attribution probe\.

### C\.4Agent construction and non\-IID statistics

We construct agents by doc\-type as described in[Section˜5](https://arxiv.org/html/2606.15335#S5)\. This induces \(i\) quantity skew \(agents have different numbers of documents\), \(ii\) label skew \(doc\-type inventories differ by design\), and \(iii\) feature skew \(document length and entity\-type distributions differ across agents\)\. In our threat model \([Section˜3\.2](https://arxiv.org/html/2606.15335#S3.SS2)\), capability tags \(agent identities such as “AssetManager”\) are treated as public; therefore, our privacy evaluation focuses on leakage*beyond*these public priors\.

[Figure˜4](https://arxiv.org/html/2606.15335#A3.F4)visualizes these heterogeneity patterns across the seven agents\.

![Refer to caption](https://arxiv.org/html/2606.15335v1/x10.png)

Figure 4:Non\-IID data heterogeneity across agents\. \(a\) Entity type distribution \(normalized\) shows different agents emphasize different PII categories\. \(b\) Document type distribution confirms disjoint doc\-type inventories by design\.
### C\.5Agent doc\-type inventories and hyperparameters

[Table˜9](https://arxiv.org/html/2606.15335#A3.T9)lists the exact doc\-type identifiers used to construct each agent and the complete hyperparameter set for reproducibility\.

Table 9:\(Left\) Agent doc\-type inventories forC=7C\{=\}7agents\. \(Right\) Hyperparameters and schedules\.AgentDoc\-type identifiersCorporateBankFinancial\_Regulatory\_Compliance \_Report;Financial\_Risk\_Assessment\.AssetManagerInvestment\_Prospectus;Product\_Disclosure\_Statement\.FinTechPayBusiness\_Plan;Dispute\_Resolution\_Policy\.CorpGroupAnnual\_Report;Audit\_Report;Financial\_Risk\_Assessment\.MarketForecasterFinancial\_Forecast\.ComplianceConsultFinancial\_Regulatory\_Compliance \_Report;Regulatory\_Compliance\_Guide\.SupplierCoSupply\_Chain\_Management\_Agreement\.
Federated trainingRounds12Local steps per round300Batch size4Learning rate2×10−42\\times 10^\{\-4\}FedProxν\\nu0\.1Loss weights and schedulesλadv\\lambda\_\{\\text\{adv\}\}1\.0GRL strengthγ\\gamma0\.5λorth\\lambda\_\{\\text\{orth\}\}0\.2λp\\lambda\_\{\\text\{p\}\}\(prototype alignment\)1\.0Proto alignment warmup30 steps \(round 2\)Prototype discriminator steps/round200Prototype noise scaleσ\\sigma0\.01

### C\.6Implementation and Evaluation Details

##### Model architecture\.

DiSanuses LongT5\-TGlobal\-Base as its backbone; full architecture details are in[Section˜B\.3](https://arxiv.org/html/2606.15335#A2.SS3)\.

##### RAG evaluation: query and ground\-truth construction\.

We construct the RAG benchmark from the sanitized/re\-written document records rather than generating free\-form queries from scratch\. Each input JSONL record contains a document identifier, domain, document type, PII annotations, and arewritten\_textfield used as the retrieval corpus\. The construction pipeline is:

1. 1\.Chunking\.Documents are split into sentence\-aware chunks with a 256\-token target length and 50\-token overlap\. Each chunk keeps itsuid, document type, source file, sample index, chunk index, and a stablechunk\_id\. Chunks are grouped by document type for generation and by agent identifier for retrieval\.
2. 2\.Anchor extraction\.For every chunk, a schema\-guided LLM prompt extracts retrieval\-oriented anchors: role hooks, topic/procedure hooks, deadlines, required items, logic gates, regulations, temporal buckets, and a short summary of the local business rule\. The prompt requires anchors to be supported by verbatim or near\-verbatim evidence from the chunk and discourages PII\-bearing names or addresses unless they are essential to the rule\.
3. 3\.Grounded QA synthesis\.We sample anchor\-annotated chunks with a fixed random seed and ask the LLM to generate one focused question, a concise ground\-truth answer, and evidence snippets for each sampled chunk\. The question must be answerable using only that chunk; the answer records decision factors such as role, requirement/deadline, and logic gate when they are explicitly present\.
4. 4\.Validation and provenance\.Generated examples are discarded if the evidence is not found in the source chunk or if the query/answer introduces known unsupported concepts\. Each retained record stores the originatingchunk\_id, merged anchors, evidence spans, and document metadata, so ChunkHit@3 can be computed against the true retrieval target\.

This procedure yields a JSONL file of grounded QA records and a separate set of per\-agent context files used by the retrieval services\. The ground\-truth answers are therefore anchored to verbatim source spans, while the evaluation query is natural language and may require the retriever to recover the correct chunk from an agent’s local index\.

##### Retrieval architecture\.

Each agent maintains a private local index built from its own context file; raw documents are not centralized for retrieval\. At evaluation time, the requester first selects candidate helper agents using the public capability tags described in[Section˜3\.1](https://arxiv.org/html/2606.15335#S3.SS1)\. Each selected helper executes the same local retrieval stack over its own chunks and then applies the evaluated sharing policy \(raw sharing, placeholder masking, paraphrasing, policy gating, orDiSan\) before transmitting evidence to the requester\. The local retrieval stack uses a three\-stage hybrid pipeline based on BGE\-M3\. Stage 1 computes candidate sets from dense semantic embeddings, learned sparse lexical vectors, and ColBERT\-style late\-interaction vectors\. Stage 2 normalizes and fuses the candidate scores, with adaptive weights that increase the sparse component for keyword\-rich queries\. Stage 3 re\-ranks the fused candidates withbge\-reranker\-v2\-m3; the final top\-kkchunks can be expanded with neighboring chunks from the same document to provide broader context for answer generation\. The requester deduplicates returned chunks bychunk\_id, keeps the highest\-scoring evidence up to the evaluation budget, and prompts the answer model to respond only from the received sanitized context\.

##### Artifact licenses and terms\.

We use publicly available datasets, models, and evaluation tools under their respective licenses or terms of use, including the synthetic finance corpus, Enron email corpus, GLiNER, BGE\-M3, LongT5, and open\-source LLM baselines\. We do not redistribute restricted raw data; released code and derived artifacts are intended for research use\.

##### PII detector for leakage metrics\.

For Avg\. PII and Ans\. Rate in[Table˜2](https://arxiv.org/html/2606.15335#S4.T2), we evaluate the text exposed to the requester usinggliner\-pii\-large\-v1\.0Zaratiana et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib45)\), a generalist named\-entity detector configured with a PII label set covering names, first/last names, email addresses, phone numbers, street/location addresses, city/state/zip, credit\-card and bank\-account numbers, SSNs, dates of birth, dates, company/organization names, usernames, IP addresses, URLs, passport numbers, and driver\-license numbers\. We use a confidence threshold of 0\.3 and count non\-overlapping detected spans\. Avg\. PII is the mean number of detected PII spans per shared/retrieved chunk after the method under evaluation has been applied\. Ans\. Rate is the percentage of generated final answers for which the same detector finds at least one PII span\. The detector is used only for evaluation; training still relies on the dataset’s annotated spans and token\-level labels, with optional detector\-derived masks used only as auxiliary signals when available\.

##### Utility metrics: cosine similarity\.

The cosine similarity reported in[Table˜2](https://arxiv.org/html/2606.15335#S4.T2)uses a bag\-of\-words term\-frequency representation rather than neural embeddings, serving as a lightweight lexical similarity measure that complements token\-level F1\. This is distinct from the 768\-dimensional BGE\-M3 dense embeddings used in retrieval\.

### C\.7Baselines: prompts and policies

For LLM paraphrasing, we apply a privacy\-focused prompt that \(i\) removes or generalizes PII, \(ii\) preserves relational semantics needed for grounding, and \(iii\) avoids source\-identifying formatting\. For policy gating, following dynamic access\-control memory sharingRezazadeh et al\. \([2025](https://arxiv.org/html/2606.15335#bib.bib34)\), the policy model decides per chunk based on the requester’s agent tag and the helper’s data sensitivity level, outputting one of three actions:*share*\(return original text\),*share summary*\(return a LLM\-generated summary\), or*refuse*\(return nothing\)\.

##### LLM paraphrasing prompt\.

The following prompt is used solely for the RAG utility evaluation reported in[Table˜2](https://arxiv.org/html/2606.15335#S4.T2)and discussed in[Section˜5\.4](https://arxiv.org/html/2606.15335#S5.SS4), where LLM paraphrasing serves as a PII\-removal baseline:

> You are an assistant that rewrites English financial or compliance\-related text to remove personally identifiable information \(PII\) while preserving all task\-relevant content\. Your task is to produce a clear, natural\-sounding rewritten version that protects individual privacy while retaining the informational value of the text\. The rewritten text should preserve financial metrics, business relationships, temporal context, and domain\-specific details that are important for downstream tasks\. Guidelines: - •Remove or replace explicit PII such as personal names, phone numbers, email addresses, account numbers, and physical addresses\. - •Preserve important non\-private information including: financial figures, percentages, growth rates, industry terms, product categories, and general business context\. - •Keep temporal references \(e\.g\., “Q3 2023”, “fiscal year”\) and geographic regions when they provide useful context without identifying individuals\. - •Maintain the logical structure, professional tone, and factual accuracy of the original text\. - •When generalizing, prefer minimal changes that protect privacy while maximizing retained information\. Return only the rewritten text\. Do not include explanations, examples, or commentary\.

##### RAG question\-answering prompt\.

For the downstream RAG evaluation, we use the following prompt to ensure the answering model relies strictly on retrieved \(sanitized\) evidence:

> You are a knowledgeable assistant that answers questions strictly based on the provided context\. Instructions: 1. 1\.Answer ONLY using information from the provided context\. Do not use external knowledge\. 2. 2\.If the context lacks sufficient information, respond: “I cannot answer this based on the provided context\.” 3. 3\.Keep your answer concise, accurate, and directly relevant to the question\.

### C\.8Ablation Results

The ablation results in[Table˜3](https://arxiv.org/html/2606.15335#S5.T3)\(main paper\) demonstrate the importance of each component\. Removing style isolation \(A1, A2\) dramatically increases PII exposure \(7–9×\\times\) while slightly improving some surface utility metrics, demonstrating the privacy cost of weaker disentanglement\. Removing prototype alignment \(B\) degrades both utility and privacy, confirming that cross\-client anchoring prevents role\-space drift\.[Table˜10](https://arxiv.org/html/2606.15335#A3.T10)reports attack evaluation: A1/A2 increase Role Acc from 0\.28 to 0\.34–0\.40, further confirming thatℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}suppresses client\-identifying signals in role embeddings\.

Table 10:Attack evaluation \(ablation\)\.EXP\-1: representation probe \(7\-way\); higher Role Acc/F1 indicates weaker privacy, while high Style Acc/F1 is a disentanglement sanity check\. EXP\-2: prototype\-based client attribution \(Bootstrap\-MLP\); only applicable to settings that upload prototypes\.![Refer to caption](https://arxiv.org/html/2606.15335v1/x11.png)

Figure 5:Role–style disentanglement visualization\. \(a\) Role\-style orthogonality per agent: cosine similarity between role and style embeddings clusters tightly around zero, indicating successful disentanglement viaℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}\. \(b\) Combined t\-SNE projection shows clear separation between role \(shared, blue circles\) and style \(private, orange triangles\) embeddings in the latent space\.
### C\.9Case Study: Cross\-Organizational IPO Analysis

![Refer to caption](https://arxiv.org/html/2606.15335v1/x12.png)

Figure 6:Cross\-organizational IPO analysis\. Lumina Capital’s Audit\-Core agent queries partner agents via a data broker; each external agent sanitizes its response withDiSanbefore transmission\. Entity names, geographic identifiers, and organizational details are removed while task\-relevant financial metrics are preserved, enabling accurate cross\-party assessment without exposing source identity\.

## Appendix DTheoretical Analysis

This section provides theoretical justification for key design choices inDiSan: \(i\) how prototype alignment mitigates role\-space drift under non\-IID data, and \(ii\) how orthogonality constraints promote disentanglement\. We complement this analysis with comprehensive empirical validation \([Sections˜5](https://arxiv.org/html/2606.15335#S5)and[C\.1](https://arxiv.org/html/2606.15335#A3.SS1)\)\.

### D\.1Prototype Alignment and Role\-Space Drift

Without global coordination, each agentcclearns a local role encoder𝐖r\(c\)\\mathbf\{W\}\_\{r\}^\{\(c\)\}that maps text to role representations\. Under non\-IID data, these local role spaces can drift apart: the same semantic concept \(e\.g\., “account number”\) may be encoded differently across agents\. This drift harms both*privacy*\(agent\-specific encodings leak identity\) and*utility*\(inconsistent representations degrade downstream tasks\)\.

Our prototype alignment loss \([Equation˜4](https://arxiv.org/html/2606.15335#S4.E4)\) acts as a*semantic anchor*that pulls local role distributions toward a shared global reference\. Formally, for each entity typek∈𝒦k\\in\\mathcal\{K\}, the loss penalizes angular deviation between the local batch centroid𝐳¯^r,k\(c\)\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}and the global prototype𝝁^k∗\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}:

ℒproto\(c\)=∑k∈𝒦\(1−cos⁡\(𝐳¯^r,k\(c\),𝝁^k∗\)\)\.\\mathcal\{L\}\_\{\\text\{proto\}\}^\{\(c\)\}=\\sum\_\{k\\in\\mathcal\{K\}\}\\left\(1\-\\cos\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\},\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}\)\\right\)\.\(22\)
We now show that this regularizer bounds the divergence between local role spaces under well\-specified assumptions\.

###### Assumption D\.1\(Normalized Centroid Boundedness\)\.

For any agentccand entity typekk, the normalized centroid has bounded norm away from zero:‖𝐳¯^r,k\(c\)‖=1\\\|\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}\\\|=1by definition, and‖𝐳¯r,k\(c\)‖≥δk\>0\\\|\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\\\|\\geq\\delta\_\{k\}\>0for some constantδk\\delta\_\{k\}\(which holds in practice since embeddings are initialized with non\-zero norms and gradients preserve this property\)\.

###### Assumption D\.2\(Sub\-Gaussian Token Embeddings\)\.

For each entity typekk, the token embeddings\{𝐳r,t\(i\)\}\(i,t\)∈Ωk\(c\)\\\{\\mathbf\{z\}\_\{r,t\}^\{\(i\)\}\\\}\_\{\(i,t\)\\in\\Omega\_\{k\}^\{\(c\)\}\}are independently drawn from a distribution with bounded sub\-Gaussian normψk\\psi\_\{k\}, i\.e\.,Pr⁡\(‖𝐳−μ‖2≥t\)≤2​exp⁡\(−ψk2​t2/2\)\\Pr\(\\\|\\mathbf\{z\}\-\\mu\\\|\_\{2\}\\geq t\)\\leq 2\\exp\(\-\\psi\_\{k\}^\{2\}t^\{2\}/2\)for allt\>0t\>0\.

###### Assumption D\.3\(Within\-Type Variance Bound\)\.

The within\-type variance is bounded:1Nc,k​∑\(i,t\)∈Ωk\(c\)‖𝐳r,t\(i\)−𝐳¯r,k\(c\)‖2≤σk2\\frac\{1\}\{N\_\{c,k\}\}\\sum\_\{\(i,t\)\\in\\Omega\_\{k\}^\{\(c\)\}\}\\\|\\mathbf\{z\}\_\{r,t\}^\{\(i\)\}\-\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\\\|^\{2\}\\leq\\sigma\_\{k\}^\{2\}, whereΩk\(c\)\\Omega\_\{k\}^\{\(c\)\}is the set of token positions of typekkin agentcc’s batch, andNc,k=\|Ωk\(c\)\|N\_\{c,k\}=\|\\Omega\_\{k\}^\{\(c\)\}\|\.

###### Assumption D\.4\(Prototype Loss Bound\)\.

The prototype alignment loss satisfiesℒproto\(c\)≤ϵ\\mathcal\{L\}\_\{\\text\{proto\}\}^\{\(c\)\}\\leq\\epsilonfor all agentscc\.

###### Assumption D\.5\(Sufficient Samples\)\.

Each agentcchas at leastNminN\_\{\\min\}samples of typekkin the batch\.

###### Theorem D\.6\(Prototype Alignment Bounds Role\-Space Divergence\)\.

Under Assumptions[D\.1](https://arxiv.org/html/2606.15335#A4.Thmtheorem1)–[D\.5](https://arxiv.org/html/2606.15335#A4.Thmtheorem5), for any two agentsc,c′c,c^\{\\prime\}that both have samples of typekk, the angular distance between their normalized batch centroids satisfies:

Pr⁡\(arccos⁡\(𝐳¯^r,k\(c\)⋅𝐳¯^r,k\(c′\)\)\>2​2​ϵ\+t​σkNmin\)≤4​exp⁡\(−ψk2​Nmin​t28\),\\Pr\\left\(\\arccos\\left\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}\\cdot\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c^\{\\prime\}\)\}\\right\)\>2\\sqrt\{2\\epsilon\}\+\\frac\{t\\,\\sigma\_\{k\}\}\{\\sqrt\{N\_\{\\min\}\}\}\\right\)\\leq 4\\exp\\left\(\-\\frac\{\\psi\_\{k\}^\{2\}N\_\{\\min\}t^\{2\}\}\{8\}\\right\),\(23\)wheret\>0t\>0is a confidence parameter,σk2\\sigma\_\{k\}^\{2\}is the within\-type variance,ψk\\psi\_\{k\}is the sub\-Gaussian parameter, and𝐳¯^r,k\(c\)=𝐳¯r,k\(c\)/‖𝐳¯r,k\(c\)‖\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}=\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}/\\\|\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\\\|denotes the normalized centroid\.

###### Proof\.

By the triangle inequality for angular distance on the unit sphere:

arccos⁡\(𝐳¯^r,k\(c\)⋅𝐳¯^r,k\(c′\)\)\\displaystyle\\arccos\\left\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}\\cdot\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c^\{\\prime\}\)\}\\right\)≤arccos⁡\(𝐳¯^r,k\(c\)⋅𝝁^k∗\)\+arccos⁡\(𝝁^k∗⋅𝐳¯^r,k\(c′\)\)\.\\displaystyle\\leq\\arccos\\left\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}\\cdot\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}\\right\)\+\\arccos\\left\(\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}\\cdot\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c^\{\\prime\}\)\}\\right\)\.\(24\)
From Assumption[D\.4](https://arxiv.org/html/2606.15335#A4.Thmtheorem4), we have:1−cos⁡\(𝐳¯^r,k\(c\),𝝁^k∗\)≤ϵ1\-\\cos\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\},\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}\)\\leq\\epsilon, since each term in the sum is non\-negative\.

For smallxx, we havearccos⁡\(1−x\)≤2​x\\arccos\(1\-x\)\\leq\\sqrt\{2x\}\(this follows from the Taylor expansionarccos⁡\(1−x\)=2​x​\(1\+O​\(x\)\)\\arccos\(1\-x\)=\\sqrt\{2x\}\(1\+O\(x\)\)forx→0x\\to 0\)\. Applying this withx=1−cos⁡\(𝐳¯^r,k\(c\),𝝁^k∗\)x=1\-\\cos\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\},\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}\):

arccos⁡\(𝐳¯^r,k\(c\)⋅𝝁^k∗\)≤2​ϵ\.\\arccos\\left\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}\\cdot\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}\\right\)\\leq\\sqrt\{2\\epsilon\}\.\(25\)The same bound holds for agentc′c^\{\\prime\}\. Substituting into[Equation˜24](https://arxiv.org/html/2606.15335#A4.E24):

arccos⁡\(𝐳¯^r,k\(c\)⋅𝐳¯^r,k\(c′\)\)≤2​2​ϵ\.\\arccos\\left\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}\\cdot\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c^\{\\prime\}\)\}\\right\)\\leq 2\\sqrt\{2\\epsilon\}\.\(26\)
We now account for finite\-sample effects\. Under Assumption[D\.2](https://arxiv.org/html/2606.15335#A4.Thmtheorem2)and Assumption[D\.3](https://arxiv.org/html/2606.15335#A4.Thmtheorem3), the empirical centroid𝐳¯r,k\(c\)\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}concentrates around the population mean𝝁r,k\\boldsymbol\{\\mu\}\_\{r,k\}asNminN\_\{\\min\}increases\. We assume the global prototype𝝁k∗\\boldsymbol\{\\mu\}\_\{k\}^\{\*\}converges to𝝁r,k\\boldsymbol\{\\mu\}\_\{r,k\}as the number of federated rounds increases \(justified by EMA aggregation across all agents\)\. Specifically, by the sub\-Gaussian concentration inequality, for anyt\>0t\>0:

Pr⁡\(‖𝐳¯r,k\(c\)−𝝁r,k‖2≥t​σkNmin\)≤2​exp⁡\(−ψk2​Nmin​t22\)\.\\Pr\\left\(\\left\\\|\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\-\\boldsymbol\{\\mu\}\_\{r,k\}\\right\\\|\_\{2\}\\geq\\frac\{t\\sigma\_\{k\}\}\{\\sqrt\{N\_\{\\min\}\}\}\\right\)\\leq 2\\exp\\left\(\-\\frac\{\\psi\_\{k\}^\{2\}N\_\{\\min\}t^\{2\}\}\{2\}\\right\)\.\(27\)
To bridge Euclidean error to angular error, we use the following inequality: for any vectorsu,vu,vwith‖u‖≥δ\\\|u\\\|\\geq\\deltaand‖v‖≥δ\\\|v\\\|\\geq\\delta,

arccos⁡\(u‖u‖⋅v‖v‖\)≤2δ​‖u−v‖2\.\\arccos\\left\(\\frac\{u\}\{\\\|u\\\|\}\\cdot\\frac\{v\}\{\\\|v\\\|\}\\right\)\\leq\\frac\{2\}\{\\delta\}\\\|u\-v\\\|\_\{2\}\.\(28\)This follows from the relationship between angular distance and chordal distance on the unit sphere\. Applying this withδ=δk\\delta=\\delta\_\{k\}\(Assumption[D\.1](https://arxiv.org/html/2606.15335#A4.Thmtheorem1)\), we have:

arccos⁡\(𝐳¯^r,k\(c\)⋅𝝁^k∗\)≤2δk​‖𝐳¯r,k\(c\)−𝝁r,k‖2\.\\arccos\\left\(\\hat\{\\bar\{\\mathbf\{z\}\}\}\_\{r,k\}^\{\(c\)\}\\cdot\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}\\right\)\\leq\\frac\{2\}\{\\delta\_\{k\}\}\\left\\\|\\bar\{\\mathbf\{z\}\}\_\{r,k\}^\{\(c\)\}\-\\boldsymbol\{\\mu\}\_\{r,k\}\\right\\\|\_\{2\}\.\(29\)
Combining the deterministic bound2​2​ϵ2\\sqrt\{2\\epsilon\}with the probabilistic error term and applying a union bound over both agents yields the stated result with the confidence parametertt\. The constants are absorbed into the exponential decay rate for clarity\. ∎

This result formalizes the intuition that prototype alignment prevents role\-space drift: as long as all agents maintain low prototype loss \(ϵ\\epsilonsmall\), their role representations for the same semantic type remain close in angular distance with high probability\. The bound degrades gracefully with within\-type varianceσk2\\sigma\_\{k\}^\{2\}and improves with more samples per type\.

Bounded role\-space divergence has a privacy implication: if all agents’ role representations for typekkare concentrated around a shared prototype𝝁^k∗\\hat\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\*\}, then observing a role embedding𝐳r\\mathbf\{z\}\_\{r\}provides limited information about which agent produced it \(beyond what is revealed by the type labelkkitself\)\. This is validated empirically in[Table˜7](https://arxiv.org/html/2606.15335#A3.T7), where role embeddings yield low attribution performance \(Acc=0\.18, F1=0\.05; 7\-way random baseline = 0\.14\)\.

### D\.2Orthogonality as a Geometric Proxy for Disentanglement

Disentanglement aims to separate role informationRR\(task\-relevant semantics\) from style informationSS\(agent\-identifying patterns\) in the learned representations\. Ideally, the role embedding𝐙r\\mathbf\{Z\}\_\{r\}should captureRRbut notSS, while the style embedding𝐙s\\mathbf\{Z\}\_\{s\}should captureSSbut notRR\.

Our orthogonality loss \([Equation˜3](https://arxiv.org/html/2606.15335#S4.E3)\) enforcescos2⁡\(𝐳¯r,𝐳¯s\)≈0\\cos^\{2\}\(\\bar\{\\mathbf\{z\}\}\_\{r\},\\bar\{\\mathbf\{z\}\}\_\{s\}\)\\approx 0, which geometrically separates the two subspaces\. We now provide an information\-theoretic perspective on why orthogonality promotes statistical independence\.

###### Definition D\.7\(Covariance Matrix for Joint Gaussian\)\.

For jointly Gaussian random vectors𝐙r∈ℝdr\\mathbf\{Z\}\_\{r\}\\in\\mathbb\{R\}^\{d\_\{r\}\}and𝐙s∈ℝds\\mathbf\{Z\}\_\{s\}\\in\\mathbb\{R\}^\{d\_\{s\}\}, let the joint covariance matrix be𝚺=\[𝚺r​r𝚺r​s𝚺r​s⊤𝚺s​s\]\\boldsymbol\{\\Sigma\}=\\begin\{bmatrix\}\\boldsymbol\{\\Sigma\}\_\{rr\}&\\boldsymbol\{\\Sigma\}\_\{rs\}\\\\ \\boldsymbol\{\\Sigma\}\_\{rs\}^\{\\top\}&\\boldsymbol\{\\Sigma\}\_\{ss\}\\end\{bmatrix\}\. We assume the full covariance matrix𝚺≻0\\boldsymbol\{\\Sigma\}\\succ 0\(positive definite\), which implies𝚺r​r≻0\\boldsymbol\{\\Sigma\}\_\{rr\}\\succ 0and𝚺s​s≻0\\boldsymbol\{\\Sigma\}\_\{ss\}\\succ 0, ensuring all determinants are well\-defined and positive\.

###### Lemma D\.8\(Gaussian Mutual InformationCover and Thomas \([2006](https://arxiv.org/html/2606.15335#bib.bib5)\)\)\.

Under Definition[D\.7](https://arxiv.org/html/2606.15335#A4.Thmtheorem7), the mutual information between𝐙r\\mathbf\{Z\}\_\{r\}and𝐙s\\mathbf\{Z\}\_\{s\}is:

I​\(𝐙r;𝐙s\)=12​log⁡\|𝚺r​r\|⋅\|𝚺s​s\|\|𝚺\|\.I\(\\mathbf\{Z\}\_\{r\};\\mathbf\{Z\}\_\{s\}\)=\\frac\{1\}\{2\}\\log\\frac\{\|\\boldsymbol\{\\Sigma\}\_\{rr\}\|\\cdot\|\\boldsymbol\{\\Sigma\}\_\{ss\}\|\}\{\|\\boldsymbol\{\\Sigma\}\|\}\.\(30\)If the cross\-covariance is zero \(𝚺r​s=𝟎\\boldsymbol\{\\Sigma\}\_\{rs\}=\\mathbf\{0\}\), then\|𝚺\|=\|𝚺r​r\|⋅\|𝚺s​s\|\|\\boldsymbol\{\\Sigma\}\|=\|\\boldsymbol\{\\Sigma\}\_\{rr\}\|\\cdot\|\\boldsymbol\{\\Sigma\}\_\{ss\}\|and consequentlyI​\(𝐙r;𝐙s\)=0I\(\\mathbf\{Z\}\_\{r\};\\mathbf\{Z\}\_\{s\}\)=0\(statistical independence\)\.

Lemma[D\.8](https://arxiv.org/html/2606.15335#A4.Thmtheorem8)shows that under jointly Gaussian representations with zero cross\-covariance, role and style are statistically independent\. Our orthogonality lossℒorth=cos2⁡\(𝐳¯r,𝐳¯s\)\\mathcal\{L\}\_\{\\text\{orth\}\}=\\cos^\{2\}\(\\bar\{\\mathbf\{z\}\}\_\{r\},\\bar\{\\mathbf\{z\}\}\_\{s\}\)enforces this separation between mean\-pooled vectors, balancing theoretical grounding with computational efficiency in federated training\. Ablations \([Section˜5\.5](https://arxiv.org/html/2606.15335#S5.SS5)\) confirm its effectiveness: removingℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}increases PII exposure by 7×\\times\.

### D\.3Summary

We have provided theoretical justification for two key design choices:

1. 1\.Prototype alignment\([Theorem˜D\.6](https://arxiv.org/html/2606.15335#A4.Thmtheorem6)\): Role\-space divergence across agents is bounded with high probability, formalizing why prototype alignment prevents drift under non\-IID data and why aligned role embeddings leak minimal agent identity\.
2. 2\.Orthogonality constraint\([Lemma˜D\.8](https://arxiv.org/html/2606.15335#A4.Thmtheorem8)\): Under Gaussian assumptions, orthogonality implies zero mutual information between role and style, providing geometric intuition for whyℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}promotes disentanglement\.

### D\.4Privacy Analysis

Our primary privacy protection comes from architectural design \(style never transmitted\), learned disentanglement, and adversarial training\. This section discusses the noise injection mechanism applied to prototype uploads\.

#### D\.4\.1Noise Injection for Prototype Communication

During federated training, each agent uploads role prototypes𝝁~k\(c\)\\tilde\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\(c\)\}to the server\. We apply Gaussian perturbation \([Equation˜21](https://arxiv.org/html/2606.15335#A2.E21)\) before upload:

𝝁~k\(c\)=normalize​\(𝝁k\(c\)\+𝒩​\(𝟎,σnoise2​𝐈\)\)\.\\tilde\{\\boldsymbol\{\\mu\}\}\_\{k\}^\{\(c\)\}=\\text\{normalize\}\\left\(\\boldsymbol\{\\mu\}\_\{k\}^\{\(c\)\}\+\\mathcal\{N\}\(\\mathbf\{0\},\\sigma\_\{\\text\{noise\}\}^\{2\}\\mathbf\{I\}\)\\right\)\.\(31\)
We useσnoise=0\.01\\sigma\_\{\\text\{noise\}\}=0\.01, which provides mild perturbation while preserving prototype semantics\. This noise scale can be tuned: larger values provide stronger perturbation but may degrade prototype quality and downstream utility\.

##### Design rationale\.

Noise injection provides an additional defense layer complementing the core privacy mechanisms inDiSan:

1. 1\.Architectural privacy: Style representations𝐙s\\mathbf\{Z\}\_\{s\}are*never transmitted*; they remain strictly local\.
2. 2\.Learned disentanglement: The role encoder produces agent\-invariant representations viaℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}andℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}\.
3. 3\.Empirical validation: Attack evaluations confirm near\-random attribution \(F1=0\.05 for embeddings, F1=0\.07 for prototypes\)\.

These mechanisms achieve strong empirical privacy without requiring formal differential privacy guarantees, which would impose significant utility costs in federated text settings\.

##### Effect of noise on prototype convergence\.

The noise scaleσnoise=0\.01\\sigma\_\{\\text\{noise\}\}=0\.01is chosen to be a supplementary defense\-in\-depth layer; the primary privacy guarantees come from architectural isolation of𝐙s\\mathbf\{Z\}\_\{s\},ℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}, andℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}\. Training remains stable across all 12 rounds becauseσ=0\.01\\sigma=0\.01induces roughly 1% perturbation relative to theℓ2\\ell\_\{2\}\-normalized prototype norm, and server\-side sample\-weighted averaging further attenuates per\-agent noise by a factor of approximately1/C1/\\sqrt\{C\}\. EXP\-2 \([Table˜7](https://arxiv.org/html/2606.15335#A3.T7)\) confirms that this configuration is sufficient: prototype attribution under both attack variants achieves near\-random performance \(F1≈\\approx0\.09\), while training loss curves show no instability attributable to noise injection\.

#### D\.4\.2Disentanglement vs\. DP for Text Privacy

For text output privacy, we adopt disentanglement rather than DP\-based approaches for two reasons:

\(1\) Aligned threat model\.DP\-SGD and DP\-text methods protect training data membership, whereas our goal is inference\-time source attribution resistance\. Disentanglement directly removes agent\-identifying patterns from generated text, addressing this threat model precisely\.

\(2\) Superior utility\-privacy tradeoff\.Existing DP\-text methodsMeisenbacher and Matthes \([2024](https://arxiv.org/html/2606.15335#bib.bib27)\); Xie et al\. \([2024](https://arxiv.org/html/2606.15335#bib.bib44)\)show 30–50% coherence loss forϵ<10\\epsilon<10\. In contrast, disentanglement achieves strong empirical privacy \(73\.2% TF\-IDF and 70\.6% neural\-probe stylometric reduction on Enron\) with only a 2\.93\-point faithfulness drop on the RAG benchmark\.

Comparison to prior work\.Recent DP\-text generationMeisenbacher and Matthes \([2024](https://arxiv.org/html/2606.15335#bib.bib27)\)achievesϵ≈8\\epsilon\\approx 8with 40% BLEU degradation\. Our method achieves 83% faithfulness \(vs\. 86% baseline\) with strong empirical privacy, demonstrating that disentanglement\-based approaches achieve better utility\-privacy tradeoffs for source attribution tasks\.

#### D\.4\.3Summary: Layered Privacy Mechanisms

DiSanprovides privacy through multiple complementary mechanisms:

1. 1\.Architectural privacy \(strong, by design\):Style representations𝐙s\\mathbf\{Z\}\_\{s\}are*never transmitted*; they remain strictly local and are discarded after decoding\.
2. 2\.Learned privacy \(empirical, validated\):Disentanglement \(ℒorth\\mathcal\{L\}\_\{\\text\{orth\}\}\) and adversarial training \(ℒadv\\mathcal\{L\}\_\{\\text\{adv\}\}\) produce agent\-invariant role representations: - •Low embedding\-attribution performance \(Acc = 0\.18, F1 = 0\.05; 7\-way random baseline = 0\.14, EXP\-1\) - •Stylometric reduction \(73\.2% TF\-IDF and 70\.6% neural\-probe reduction on Enron, EXP\-3b\) - •Near\-random prototype attribution \(EXP\-2\)
3. 3\.Noise perturbation:Prototype uploads are perturbed with Gaussian noise \(σ=0\.01\\sigma=0\.01\) before transmission\.

Design rationale\.Our default configuration prioritizes utility, relying on architectural and learned mechanisms that empirically achieve strong privacy \(near\-random attribution\)\.

Similar Articles

Minim: Privacy-Aware Minimal View for Agents via Trusted Local Sanitization

arXiv cs.AI

This paper introduces Minim, a trusted local broker that performs privacy-aware minimization of UI observations for LLM-powered agents, using contextual integrity to balance task necessity and sensitivity scores. Experiments on WebArena show it reduces irrelevant sensitive leakage while preserving task-critical information.

PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

Hugging Face Daily Papers

This paper introduces PAAC, a privacy-aware agentic framework for device-cloud collaboration that uses a decoupled architecture and LLM-driven sanitization to protect sensitive data while maintaining high performance.

The Safety-Aware Denoiser for Text Diffusion Models

arXiv cs.LG

This paper introduces the Safety-Aware Denoiser (SAD), a framework for integrating safety constraints into text diffusion models during the denoising process. It aims to reduce unsafe generations while preserving quality, addressing a gap in safety research for non-autoregressive models.