PE-means: Improved Differentially Private $k$-means Clustering through Private Evolution
Summary
PE-means adapts the private evolution algorithm to differentially private k-means clustering, achieving a 20% average improvement in clustering loss over existing methods.
View Cached Full Text
Cached at: 06/02/26, 03:41 PM
# Improved Differentially Private 𝑘-means Clustering through Private Evolution
Source: [https://arxiv.org/html/2606.00342](https://arxiv.org/html/2606.00342)
###### Abstract
We study the problem of differentially private \(DP\)kk\-means clustering in Euclidean space\. Previous solutions rely on summing the private data directly, which induces a sensitivity proportional to the domain\. We introduce PE\-means, an extension of the private evolution \(PE\) algorithm \(an increasingly popular method for synthetic data generation\), to the problem ofkk\-means clustering\. The key advantage of PE is that it only computes a private histogram with constant sensitivity to guide the evolution\. Our adaptation of PE includes new evolutionary operators for clustering, as well as other algorithmic improvements of independent interest\. Overall, PE\-means achieves an average improvement of 20% in clustering loss over state\-of\-the\-art baselines\.
## IIntroduction
Thekk\-means clustering algorithm is a foundational tool in data science with many influential applications such as recommendation systems and healthcare data analysis\[[25](https://arxiv.org/html/2606.00342#bib.bib20),[16](https://arxiv.org/html/2606.00342#bib.bib32)\]\. The objective is to group the data intokkgroups with similar features, allowing practitioners to interpret complex datasets\. The challenge is that applying standardkk\-means algorithms to sensitive data risks the privacy of participants in the dataset\. In the worst case, thekk\-means algorithm can publish the exact data record of an individual who is an outlier in the set\. To address this, there has been a large body of work that considers the problem of differentially privatekk\-means\[[31](https://arxiv.org/html/2606.00342#bib.bib5),[5](https://arxiv.org/html/2606.00342#bib.bib9),[4](https://arxiv.org/html/2606.00342#bib.bib6),[3](https://arxiv.org/html/2606.00342#bib.bib19),[1](https://arxiv.org/html/2606.00342#bib.bib7),[30](https://arxiv.org/html/2606.00342#bib.bib27),[25](https://arxiv.org/html/2606.00342#bib.bib20),[12](https://arxiv.org/html/2606.00342#bib.bib24),[15](https://arxiv.org/html/2606.00342#bib.bib28)\]\. Differential privacy \(DP\) is a popular privacy definition that ensures changes in a single user’s input does not have a significant effect on the output of the mechanism\[[9](https://arxiv.org/html/2606.00342#bib.bib13)\]\. By adding calibrated randomization to the clustering process, existing approaches retain the benefits ofkk\-means clustering while providing a formal privacy guarantee\.
The challenge with DP is balancing the inherent trade\-off between privacy and utility\. Existing state\-of\-the\-art approaches for DPkk\-means rely on summing the private data directly at some point in the process\[[4](https://arxiv.org/html/2606.00342#bib.bib6),[31](https://arxiv.org/html/2606.00342#bib.bib5),[1](https://arxiv.org/html/2606.00342#bib.bib7)\]\. The sensitivity of \(or largest impact any one user can have on\) a sum query is the addition \(or removal\) of a point on the boundary of the domain\. This means that to satisfy DP, noise must be added proportionally to the entire domain, significantly degrading the quality of the clustering\. In recent work, an approach called FastLloyd managed to reduce this sensitivity from the whole domain to a smaller radius around each cluster through relative cluster updates\[[5](https://arxiv.org/html/2606.00342#bib.bib9)\]\. However, the radius is still proportional to the data dimension, and FastLloyd introduces an additional error term due to the relative updates\.
Private evolution \(PE\) is an increasingly popular technique for generating various modalities of synthetic data\[[22](https://arxiv.org/html/2606.00342#bib.bib2),[32](https://arxiv.org/html/2606.00342#bib.bib3),[21](https://arxiv.org/html/2606.00342#bib.bib4)\]\. PE evolves a population of synthetic data using only inference API access to foundation models that generate and perturb the data in an evolutionary algorithm\. A key component in PE’s success is that it only uses the private data in a nearest neighbour voting scheme to choose the best samples generated so far\. Specifically, each private data point only contributes a single vote to the algorithm in each iteration\. Thus, adding or removing any private data point affects the output by a constant sensitivity of11, regardless of the data domain\. This gives PE a significant advantage over gradient\-based techniques, which must add noise proportionally to the gradient domain\. What makes PE truly impressive is that, despite getting much less signal from the private data, it still effectively traverses large and complex data domains by leveraging the robust optimization characteristics of evolution\.
In this work, we introduce PE\-means, an extension of the PE framework to the problem ofkk\-means clustering\. In this application, the population of PE contains randomly generated centroids that are iteratively evolved towards the centroids of the private data\. Our design replaces PE’s API calls to foundation models with a lightweight clustering initialization and a Lévy flight\-based mutation operator\. In addition to adapting PE to clustering, we also make several improvements to the algorithm that are of independent interest\. Specifically, we address a critical issue with PE’s selection technique where votes can be split between the best candidates, causing no representative from certain areas to be selected\. We also reduce the impact of DP noise through better post\-processing of the vote histogram and providing a mechanism to adaptively adjust the signal\-to\-noise ratio at no cost to privacy\. Our experimental evaluation shows PE\-means outperforms state\-of\-the\-art baselines over numerous real and synthetic datasets\. Furthermore, we propose HDPE\-means, a variant of PE\-means that uses a similar dimensionality reduction to Balcan et al\.’s\[[1](https://arxiv.org/html/2606.00342#bib.bib7)\], allowing PE\-means to scale to higher dimensions\. Overall, we observe up to 91% improvement in clustering loss over the state\-of\-the\-art, with an average improvement of20%20\\%across all the datasets evaluated\.
## IIBackground
### II\-Akk\-Means Clustering
Given a datasetDDof sizeNNand dimensiondd, thekk\-means problem aims to partition the dataset intokkgroups \(clusters\)C=\{C1,…,Ck\}C=\\\{C\_\{1\},\\dots,C\_\{k\}\\\}, with eachCi⊂DC\_\{i\}\\subset D, such that the following objective is minimized,
argminC∑i=1k∑x∈Ci‖x−μi‖2\\arg\\min\_\{C\}\\sum\_\{i=1\}^\{k\}\\sum\_\{x\\in C\_\{i\}\}\\\|x\-\\mu\_\{i\}\\\|^\{2\}\(1\)whereμi\\mu\_\{i\}is the centroid of clusterii\(μi=1\|Ci\|∑x∈Cix\\mu\_\{i\}=\\frac\{1\}\{\|C\_\{i\}\|\}\\sum\_\{x\\in C\_\{i\}\}x\)\. In this work, we assume the size and dimension of the dataset are public information, but we require any other information published from the dataset to satisfy differential privacy\.
### II\-BDifferential Privacy
Differential privacy \(DP\)\[[9](https://arxiv.org/html/2606.00342#bib.bib13)\]adds randomness to the computation of aggregate statistics to protect the privacy of individuals\. DP guarantees that an algorithm’s output is approximately the same, regardless of the participation of any one user\. More formally, differential privacy can be defined as follows\.
###### Definition II\.1\(Differential Privacy\)\.
A randomized algorithmM:𝒟↦ℝM:\\mathcal\{D\}\\mapsto\\mathbb\{R\}is\(ϵ,δ\)\(\\epsilon,\\delta\)\-DP, if for any pair of neighbouring datasetsD,D′∈𝒟D,D^\{\\prime\}\\in\\mathcal\{D\}, and for anyS⊆ℝS\\subseteq\\mathbb\{R\}we have
Pr\[M\(D\)∈S\]≤eϵPr\[M\(D′\)∈S\]\+δ\.\\Pr\[M\(D\)\\in S\]\\leq e^\{\\epsilon\}\\Pr\[M\(D^\{\\prime\}\)\\in S\]\+\\delta\.\(2\)
The privacy parameterϵ\\epsilondefines how similar the outputs distributions must be, andδ\\deltaallows a small chance of failure in the definition\. We use the unbounded neighbouring definition, where datasets are neighbours if\|D\\D′∪D′\\D\|=1\|D\\backslash D^\{\\prime\}\\cup D^\{\\prime\}\\backslash D\|=1\(we allow for the addition or removal of a single data point\)\. Arbitrary computations can be carried out on the output of a DP mechanism without affecting privacy due to the post\-processing lemma\[[10](https://arxiv.org/html/2606.00342#bib.bib14)\]\. Finally, DP is composed naturally with multiple runs of a mechanism\. If we apply differentially private mechanisms sequentially, the privacy parameter composes through summation or more advanced methods\[[10](https://arxiv.org/html/2606.00342#bib.bib14)\]\.
###### Definition II\.2\(Sensitivity\)\.
Letf:𝒟↦ℝkf:\\mathcal\{D\}\\mapsto\\mathbb\{R\}^\{k\}\. If𝔻\\mathbb\{D\}is a distance metric between elements ofℝk\\ \\mathbb\{R\}^\{k\}then the𝔻\\mathbb\{D\}\-sensitivity offfis
Δ\(f\)=max\(D,D′\)𝔻\(f\(D\),f\(D′\)\),\\Delta^\{\(f\)\}=\\max\_\{\(D,D^\{\\prime\}\)\}\\mathbb\{D\}\(f\(D\),f\(D^\{\\prime\}\)\),\(3\)where\(D,D′\)\(D,D^\{\\prime\}\)are pairs of neighbouring datasets\.
We primarily use theℓ2\\ell\_\{2\}norm as the distance metric𝔻\\mathbb\{D\}\. To satisfy DP, we use Gaussian noise in this work and analyze it with Gaussian Differential Privacy \(GDP\)\[[6](https://arxiv.org/html/2606.00342#bib.bib17)\]\.
###### Definition II\.3\(GDP\[[6](https://arxiv.org/html/2606.00342#bib.bib17)\]\)\.
A mechanismMMis said to satisfyθ\\theta\-Gaussian Differential Privacy \(θ\\theta\-GDP\) if it isGθG\_\{\\theta\}\-DP\. That is,
𝒯\(M\(D\),M\(D′\)\)≥Gθ\\mathcal\{T\}\\big\(M\(D\),M\(D^\{\\prime\}\)\\big\)\\geq G\_\{\\theta\}for all neighbouring datasetsDDandD′D^\{\\prime\}, where𝒯\\mathcal\{T\}is a trade\-off function measuring the difficulty for attackers in identifying the presence of an individual data point andGθ=𝒯\(𝒩\(0,1\),𝒩\(θ,1\)\)G\_\{\\theta\}=\\mathcal\{T\}\\big\(\\mathcal\{N\}\(0,1\),\\mathcal\{N\}\(\\theta,1\)\\big\)\(see Dong et al\.\[[6](https://arxiv.org/html/2606.00342#bib.bib17)\]for specifics of the definition\)\.
Naturally, the Gaussian Mechanism satisfies GDP\.
###### Theorem II\.1\.
\(Gaussian Mechanism GDP\[[6](https://arxiv.org/html/2606.00342#bib.bib17)\]\) Define the Gaussian mechanism that operates on a statisticffasM\(D\)=f\(D\)\+ηM\(D\)=f\(D\)\+\\eta, whereη∼𝒩\(0,\(Δ\(f\)\)/θ\)\\eta\\sim\\mathcal\{N\}\(0,\(\\Delta^\{\(f\)\}\)/\\theta\)\. Then,MMisθ\\theta\-GDP\.
We consider the composition of GDP over multiple runs of a mechanism\.
###### Theorem II\.2\(GDP Composition\[[6](https://arxiv.org/html/2606.00342#bib.bib17)\]\)\.
Thenn\-fold composition ofθi\\theta\_\{i\}\-GDP mechanisms isθ12\+⋯\+θn2\\sqrt\{\\theta\_\{1\}^\{2\}\+\\cdots\+\\theta\_\{n\}^\{2\}\}\-GDP\.
We also have a way to convert between GDP and DP\.
###### Theorem II\.3\(GDP to DP\[[6](https://arxiv.org/html/2606.00342#bib.bib17),[2](https://arxiv.org/html/2606.00342#bib.bib18)\]\)\.
A mechanism isθ\\theta\-GDP if and only if it is\(ϵ,δ\(ϵ\)\)\\big\(\\epsilon,\\delta\(\\epsilon\)\\big\)\-DP for allϵ≥0\\epsilon\\geq 0, where
δ\(ϵ\)=Φ\(−εθ\+θ2\)−eϵΦ\(−εθ−θ2\)\.\\delta\(\\epsilon\)=\\Phi\\Big\(\-\\frac\{\\varepsilon\}\{\\theta\}\+\\frac\{\\theta\}\{2\}\\Big\)\-e^\{\\epsilon\}\\Phi\\Big\(\-\\frac\{\\varepsilon\}\{\\theta\}\-\\frac\{\\theta\}\{2\}\\Big\)\.
In practice, we use the algorithm derived by Balle and Wang to solve this function forθ\\theta\[[2](https://arxiv.org/html/2606.00342#bib.bib18), Algorithm 1\]\.
## IIIMethodology
Algorithm 1PE\-meansbased on Private Evolution \(PE\)\[[22](https://arxiv.org/html/2606.00342#bib.bib2)\]1:
Private samples:D=\{xi,\}i=1N⊂ℝdD=\\left\\\{x\_\{i\},\\right\\\}\_\{i=1\}^\{N\}\\subset\\mathbb\{R\}^\{d\}Number of iterations:TTNumber of generated samples:Nsyn= kN\_\{\\mathrm\{syn\}\}\\hbox\{\\pagecolor\{gray\!20\}= k\}Number of variations:LLNoise multiplier for the DP Nearest Neighbors Histogram:σ\\sigma
2:DP cluster centres
ST′=\{μ1,…,μk\}⊂ℝdS^\{\\prime\}\_\{T\}=\\\{\\mu\_\{1\},\\dots,\\mu\_\{k\}\\\}\\subset\\mathbb\{R\}^\{d\}
3:
4:
S0←S\_\{0\}\\leftarrowRANDOM\_API
\(Nsyn∗L\)\(N\_\{\\mathrm\{syn\}\}\*L\)
5:for
t←1,…,Tt\\leftarrow 1,\\ldots,Tdo
6:
histt←DP\_NN\_HISTOGRAM\(D,St−1,σ\)hist\_\{t\}\\leftarrow\\textsf\{DP\\\_NN\\\_HISTOGRAM\}\{\}\\left\(D,S\_\{t\-1\},\\sigma\\right\)⊳\\trianglerightSee[Algorithm2](https://arxiv.org/html/2606.00342#alg2)
7:
Pt←histt/sum\(histt\)P\_\{t\}\\leftarrow hist\_\{t\}/\\mathrm\{sum\}\(hist\_\{t\}\)⊳\\trianglerightPtP\_\{t\}is a distribution onSt−1S\_\{t\-1\}
8:
St′←S\_\{t\}^\{\\prime\}\\leftarrowrank samplesbyPtP\_\{t\}and drawNsynN\_\{\\mathrm\{syn\}\}samples fromSt−1S\_\{t\-1\}
9:
St′←S\_\{t\}^\{\\prime\}\\leftarrowWEIGHTED\_K\_MEANS\(
St−1S\_\{t\-1\}, k=
NsynN\_\{\\mathrm\{syn\}\}, weights=
histthist\_\{t\}\)⊳\\trianglerightReturns centroids
10:
St←St′S\_\{t\}\\leftarrow S\_\{t\}^\{\\prime\}⊳\\trianglerightSave best samples
11:ifsum\(
histt2\)/\(Nσ2\)<1\.0hist\_\{t\}^\{2\}\)/\(N\\sigma^\{2\}\)<1\.0then
12:
L←L/2L\\leftarrow L/2
13:endif
14:for
i←1,…,Li\\leftarrow 1,\\dots,Ldo
15:
St←St∪S\_\{t\}\\leftarrow S\_\{t\}\\cupVARIATION\_API
\(St′\)\(\{S\_\{t\}^\{\\prime\}\}\)
16:endfor
17:endfor
18:return
ST′S^\{\\prime\}\_\{T\}
Private Evolution \(PE\) is an emerging private synthetic data generation technique that evolves a set of synthetic samples using only inference API access to large machine learning models\[[22](https://arxiv.org/html/2606.00342#bib.bib2),[32](https://arxiv.org/html/2606.00342#bib.bib3),[21](https://arxiv.org/html/2606.00342#bib.bib4)\]\. At a high level, PE starts from randomly generated synthetic data \(using theRANDOM\_API\) and iteratively evolves toward high\-quality synthetic data by creating new variations \(using theVARIATION\_API\) and ranking the synthetic data that is closest to the private data\. We build upon the text variant of PE called Aug\-PE\[[32](https://arxiv.org/html/2606.00342#bib.bib3)\]\. In Algorithm[1](https://arxiv.org/html/2606.00342#alg1), we outline the baseline version of PE\. Algorithm[1](https://arxiv.org/html/2606.00342#alg1)also illustrates our approach, PE\-means, highlighting our modifications to both improve PE and adapt it to solvekk\-means clustering\.
### III\-ARandom API
The first step in PE is to generate an initial population, independently of the private data, using theRANDOM\_API\(Line[4](https://arxiv.org/html/2606.00342#alg1.l4)\)\. In previous variants of PE, this is done by querying a pre\-trained foundation model \(using information such as the label, which is assumed to be public\)\. To solve clustering with PE, rather than creating a population of synthetic data samples, we create a population of centroids to evolve\. We assume the only context available to generate the initial population is the shape of the data domain\. For simplicity, we assume the domain is a hyper\-sphere and use the sameℓ2\\ell\_\{2\}boundRRon its radius that other approaches need to bound sensitivity\[[31](https://arxiv.org/html/2606.00342#bib.bib5),[4](https://arxiv.org/html/2606.00342#bib.bib6),[1](https://arxiv.org/html/2606.00342#bib.bib7),[5](https://arxiv.org/html/2606.00342#bib.bib9)\]\. However, in our case, the bound being too tight or loose can at most affect convergence speed and has no effect on the amount of noise added\. A naiveRANDOM\_APIfor clustering would simply generate uniformly random data within the hyper\-sphere\. However, random points may group together, which reduces our coverage of the domain\. Thus, we instead follow the work of Su et al\.\[[31](https://arxiv.org/html/2606.00342#bib.bib5)\]and use the sphere packing method\. Su et al\.’s method starts with a given radiusaa\(typically half the radius of the domain\), and iteratively adds random samples that are at leastaaaway from the boundary and2a2afrom the samples added so far\. If after a fixed number of retries, a random sample can not be found to meet these conditions,aais halved and the algorithm continues untilkksamples are found\. We adapt Su et al\.’s algorithm to consider anℓ2\\ell\_\{2\}bounded hyper\-sphere rather than anℓ∞\\ell\_\{\\infty\}bounded hypercube\. The rest of the algorithm remains the same\.
### III\-BSelecting Candidates
Algorithm 2ModifiedDP\_NN\_HISTOGRAM1:
Private samples:DDGenerated samples:S=\{zi\}i=1nS=\\left\\\{z\_\{i\}\\right\\\}\_\{i=1\}^\{n\}Noise multiplier:σ\\sigmaDistance function:d\(⋅,⋅\)d\{\}\\left\(\\cdot,\\cdot\\right\)
2:DP nearest neighbours histogram on
SS
3:
4:
histogram←\[0,…,0\]histogram\\leftarrow\[0,\\dots,0\]
5:for
xpriv∈Dx\_\{\\mathrm\{priv\}\}\\in Ddo
6:
i=argminj∈\[n\]d\(xpriv,zj\)i=\\arg\\min\_\{j\\in\\left\[n\\right\]\}d\{\}\\left\(x\_\{\\mathrm\{priv\}\},z\_\{j\}\\right\)
7:
histogram\[i\]←histogram\[i\]\+1histogram\[i\]\\leftarrow histogram\[i\]\+1
8:endfor
9:
histogram←histogram\+𝒩\(0,σIn\)histogram\\leftarrow histogram\+\\mathcal\{N\}\\left\(0,\\sigma I\_\{n\}\\right\)
10:
histogram←sort\_descending\(histogram\)histogram\\leftarrow\\text\{sort\\\_descending\}\(histogram\)
11:
k←min\{κ∣∑j=1κhistogram\[j\]\>N\}k\\leftarrow\\min\\left\\\{\\kappa\\mid\\sum\_\{j=1\}^\{\\kappa\}histogram\[j\]\>N\\right\\\}
12:
histogram\[j\]←0histogram\[j\]\\leftarrow 0for all
j\>kj\>k⊳\\trianglerightZero out noisy values
13:return
histogramhistogram
The next step in PE is to select the best candidates from the population \(either from the random initialization or previous iteration\)\. To utilize the private data in this selection, we compute theDP\_NN\_HISTOGRAM\(Line[6](https://arxiv.org/html/2606.00342#alg1.l6)\) described in Algorithm[2](https://arxiv.org/html/2606.00342#alg2)\. The idea is that all private samples vote for their nearest neighbour in the population \(Line[6](https://arxiv.org/html/2606.00342#alg2.l6)and[7](https://arxiv.org/html/2606.00342#alg2.l7)\)\. This creates a histogram of votes that is easy to privatize using Gaussian noise in Line[9](https://arxiv.org/html/2606.00342#alg2.l9)\. However, depending on the size of the population and progression of the evolution, this histogram can be quite sparse\. Previous PE versions post\-processed the histogram by zeroing out any noisy votes below a certain threshold\. The challenge is how to choose the optimal threshold without spending privacy budget\. We propose a modification based on the constraint that, before adding noise, the histogram of votes should be non\-negative and sum to the size of the dataset \(which we assume to be public\)\. The maximum likelihood estimation \(MLE\) of the true votes given these constraints is found by projecting the noisy votes onto the set\{x∈ℝ\|xi≥0,∑ixi=N\}\\\{x\\in\\mathbb\{R\}\|x\_\{i\}\\geq 0,\\sum\_\{i\}x\_\{i\}=N\\\}\. Duchi et al\.\[[8](https://arxiv.org/html/2606.00342#bib.bib34)\]give an efficient algorithm for this projection by computing a subset of bins with the largest noisy counts such that after scaling and thresholding they sum to exactlyNN\. For clustering, we do not need the sum constraint to be exact; a more critical issue is points far from the private data getting assigned a few false votes due to noise and pulling the centroids in the wrong direction\. Thus, we give a simple approximation of the MLE algorithm that computes the minimum number of highest count buckets that exceed the sum constraint \(Line[11](https://arxiv.org/html/2606.00342#alg2.l11)\) and zero out the remaining buckets \(Line[12](https://arxiv.org/html/2606.00342#alg2.l12)\), without any additional scaling\. Intuitively, we assume histogram buckets with large counts are likely to have dominated the noise\.
After computing the private histogram, the next evolutionary task is to choose the best candidates from the population to survive to the next iteration based on this histogram\. Aug\-PE takes the approach of ranking candidates based on the number of votes they received in the histogram \(Lines[7](https://arxiv.org/html/2606.00342#alg1.l7)to[8](https://arxiv.org/html/2606.00342#alg1.l8)\)\. The drawback to this approach is that choosing the most popular points does not always yield optimal clustering coverage due to what we call vote splitting\. Consider a scenario where PE generates a point that is the nearest neighbour of many private dataset points\. In the next iteration, PE may generate numerous new samples close to this point, causing the votes to be split between these new samples\. Then, since all new points have a low count, an entire area of feature space may not be selected\. We illustrate an example of this for clustering in Section[IV\-B](https://arxiv.org/html/2606.00342#S4.SS2)\. We argue that instead of just choosing points with the highest votes, we also need to choose one or two solutions from areas with highly split votes\. Our key observation is that standardkk\-means clustering algorithms have a similar objective\. Thus, we use the population as a core set weighted by the histogram and run standardkk\-means to return the current best \(Line[9](https://arxiv.org/html/2606.00342#alg1.l9)\)\.111We could instead use akk\-medians algorithm to select candidates strictly from the population satisfying a similar objective\. However, we see no reason to enforce that constraint in this work, as the standardkk\-means algorithm can move us closer to the true centroids while making the selection\.We show that this technique is much more robust to vote splitting in Section[IV\-B](https://arxiv.org/html/2606.00342#S4.SS2)\.
### III\-CVariation API
After selecting the best candidates, the next step is to continue exploring the space for better centroids by creating variations of the selected candidates\. We note that we preserve the best candidates \(referred to as elitism in evolutionary algorithms\) in Line[10](https://arxiv.org/html/2606.00342#alg1.l10), so that if no variation improves, we maintain the current progress\. In previous versions of PE, theVARIATION\_API\(Line[15](https://arxiv.org/html/2606.00342#alg1.l15)\) would be another API call to a foundation model\. For example, Aug\-PE\[[32](https://arxiv.org/html/2606.00342#bib.bib3)\]would randomly mask words and ask a large language model to fill them in with a high temperature\. In the case of clustering, ourVARIATION\_APIperturbs the real\-valued candidate centroids directly\. We find that mutation based on Lévy flights works well, as the distribution is heavier\-tailed to encourage exploration\. We follow Mantegna’s algorithm\[[24](https://arxiv.org/html/2606.00342#bib.bib1)\]to generate a Lévy stable distribution parameterized byβ\\betawhich we denote byLévy\(β\)\\text\{Lévy\}\(\\beta\)\. To sample a valueζ∼Lévy\(β\)\\zeta\\sim\\text\{Lévy\}\(\\beta\), we first we compute
σu=\(Γ\(1\+β\)sin\(πβ2\)Γ\(1\+β2\)⋅β⋅2\(β−1\)/2\)1/β,\\sigma\_\{u\}=\\left\(\\frac\{\\Gamma\(1\+\\beta\)\\,\\sin\\\!\\left\(\\tfrac\{\\pi\\beta\}\{2\}\\right\)\}\{\\Gamma\\\!\\left\(\\tfrac\{1\+\\beta\}\{2\}\\right\)\\cdot\\beta\\cdot 2^\{\(\\beta\-1\)/2\}\}\\right\)^\{1/\\beta\},\(4\)then draw independent samplesu∼𝒩\(0,σuId\)u\\sim\\mathcal\{N\}\(0,\\sigma\_\{u\}I\_\{d\}\)andv∼𝒩\(0,Id\)v\\sim\\mathcal\{N\}\(0,I\_\{d\}\)that we combine to getζ=u\|v\|1/β\\zeta=\\frac\{u\}\{\|v\|^\{1/\\beta\}\}\. Our variation API adds i\.i\.d\. noise from this distribution to the existing centroids, scaled by a hyperparameterγ\\gammaand the radiusRR, namely
VARIATION\_API\(μi\)=μi\+γRζ\\textsf\{VARIATION\\\_API\}\(\\mu\_\{i\}\)=\\mu\_\{i\}\+\\gamma R\\zeta\(5\)whereζ∼Lévy\(β\)\\zeta\\sim\\text\{Lévy\}\(\\beta\)\. Finally, we clip any mutated sample withℓ2\\ell\_\{2\}norm greater thanRRto the boundary\. We experimentally set the hyperparameters for our variation API asβ=1\.75\\beta=1\.75andγ=0\.01\\gamma=0\.01\.
Following Aug\-PE, we apply the variation APILLtimes to each sample to increase exploration \(Line[15](https://arxiv.org/html/2606.00342#alg1.l15)\)\. Tuning theLLparameter is critical\. IfLLis too small, the algorithm converges slowly, needing more iterations and privacy budget\. Alternatively, ifLLis too large, the histogram becomes sparse, resulting in a poor signal\-to\-noise ratio and poor quality selections\. While we can tuneLLas a hyperparameter \(we show a heuristic for setting it in Section[IV\-C](https://arxiv.org/html/2606.00342#S4.SS3)\), the best setting forLLvaries greatly depending on the distribution of the private data and the current distribution of the candidates\. To help avoid the more damaging case of a largeLLpreventing meaningful selection, we add adaptive tuning ofLLduring evolution\. In Line[11](https://arxiv.org/html/2606.00342#alg1.l11), we estimate the noise to signal ratio \(without using any privacy budget\) by computing theℓ2\\ell\_\{2\}norm of the noisy histogram vs\. the expectedℓ2\\ell\_\{2\}norm of the noise\. We argue if this value is less than11, there is likely more noise than signal, and we reduceLLby half \(Line[12](https://arxiv.org/html/2606.00342#alg1.l12)\)\. We experimented with different thresholds and amounts of reduction but found these settings work well in practice\.
### III\-DHigh Dimensional Case
Algorithm 3High Dimensional PE\-means \(HDPE\-means\)1:
Private samples:D=\{xi\}i=1N⊂ℝdD=\\\{x\_\{i\}\\\}\_\{i=1\}^\{N\}\\subset\\mathbb\{R\}^\{d\}Reduced dimension:ppNumber of clusters:kkNumber of PE iterations:TTNumber of PE variations:LLNoise multiplier:σ\\sigmaDomain radius:RR
2:DP cluster centres
S=\{μ1,…,μk\}⊂ℝdS=\\\{\\mu\_\{1\},\\dots,\\mu\_\{k\}\\\}\\subset\\mathbb\{R\}^\{d\}
3:
4:Sample random projection matrix
G∼𝒩\(0,1\)p×dG\\sim\\mathcal\{N\}\(0,1\)^\{p\\times d\}
5:
D¯←\{x¯i\}i=1N\\bar\{D\}\\leftarrow\\\{\\bar\{x\}\_\{i\}\\\}\_\{i=1\}^\{N\}where
x¯i=1dGxi\\bar\{x\}\_\{i\}=\\frac\{1\}\{\\sqrt\{d\}\}Gx\_\{i\}⊳\\trianglerightProject data toℝp\\mathbb\{R\}^\{p\}
6:
\{μ¯1,…,μ¯k\}←PE\-Means\(D¯,T−2,k,L,σ\)\\\{\\bar\{\\mu\}\_\{1\},\\dots,\\bar\{\\mu\}\_\{k\}\\\}\\leftarrow\\textsf\{PE\-Means\}\(\\bar\{D\},T\-2,k,L,\\sigma\)⊳\\trianglerightGet centroids inℝp\\mathbb\{R\}^\{p\}using Algorithm[1](https://arxiv.org/html/2606.00342#alg1)
7:for
i←1,…,Ni\\leftarrow 1,\\ldots,Ndo
8:
ai←argminj∈\{1,…,k\}‖x¯i−μ¯j‖2a\_\{i\}\\leftarrow\\arg\\min\_\{j\\in\\\{1,\\dots,k\\\}\}\\\|\\bar\{x\}\_\{i\}\-\\bar\{\\mu\}\_\{j\}\\\|\_\{2\}⊳\\trianglerightAssign labels inℝd\\mathbb\{R\}^\{d\}
9:endfor
10:for
j←1,…,kj\\leftarrow 1,\\ldots,kdo
11:
zj←∑i:ai=jxi\+𝒩\(0,RσIn\)z\_\{j\}\\leftarrow\\sum\_\{i:a\_\{i\}=j\}x\_\{i\}\+\\mathcal\{N\}\(0,R\\sigma I\_\{n\}\)⊳\\trianglerightNoisy sum inℝd\\mathbb\{R\}^\{d\}
12:
nj←∑i:ai=j1\+𝒩\(0,σIn\)n\_\{j\}\\leftarrow\\sum\_\{i:a\_\{i\}=j\}1\+\\mathcal\{N\}\(0,\\sigma I\_\{n\}\)⊳\\trianglerightNoisy count
13:
μj←zjnj\\mu\_\{j\}\\leftarrow\\frac\{z\_\{j\}\}\{n\_\{j\}\}
14:endfor
15:return
S=\{μ1,…,μk\}S=\\\{\\mu\_\{1\},\\dots,\\mu\_\{k\}\\\}
In high\-dimensional domains \(especially whenkkis small\), variations become less effective due to the curse of dimensionality\. To combat this, in Algorithm[3](https://arxiv.org/html/2606.00342#alg3), we present a modified version of PE\-means that is optimized for higher dimensions\. We follow the work of Balcan et al\.\[[1](https://arxiv.org/html/2606.00342#bib.bib7)\]and first use a Johnson\-Lindenstrauss transform to project the private data to a low\-dimensional space \(Line[5](https://arxiv.org/html/2606.00342#alg3.l5)\)\. We then run PE\-means as normal in this low\-dimensional space in Line[6](https://arxiv.org/html/2606.00342#alg3.l6)\. To project the resulting cluster centres back to the original domain, we use noisy averaging\. This can be thought of as a single iteration of DP Lloyd’s algorithm\[[31](https://arxiv.org/html/2606.00342#bib.bib5)\]in the original domain, initialized with the low\-dimensional cluster assignments\. Specifically, all private data points are assigned a label using the low\-dimensional centroids \(Line[7](https://arxiv.org/html/2606.00342#alg3.l7)to[9](https://arxiv.org/html/2606.00342#alg3.l9)\)\. Then we compute the sum and count of all private points for a given label to obtain the centroids in high\-dimensional space \(Line[10](https://arxiv.org/html/2606.00342#alg3.l10)to[14](https://arxiv.org/html/2606.00342#alg3.l14)\)\. To ensure privacy, we add Gaussian noise to the sum and count of points in each cluster\. We reduce the number of iterations for PE by two in Line[6](https://arxiv.org/html/2606.00342#alg3.l6), so we can use the same noise multiplier to noise the sum and count\. We show in Section[IV](https://arxiv.org/html/2606.00342#S4)that HDPE\-means outperforms PE\-means in higher dimensions\. The disadvantage of this approach is that, similar to related work\[[4](https://arxiv.org/html/2606.00342#bib.bib6),[1](https://arxiv.org/html/2606.00342#bib.bib7)\], we add noise proportional toRR\(the radius of the data domain\) during the projection back to high dimensions\. In future work, we will investigate alternate ways to scale the variation API while reducing this dependency\.
### III\-EPrivacy Analysis
The privacy analysis of PE\-means follows from the original PE paper\[[22](https://arxiv.org/html/2606.00342#bib.bib2)\]as our changes do not affect noise addition in Algorithm[2](https://arxiv.org/html/2606.00342#alg2)or use private data in any way\. We restate the analysis here for completeness\.
###### Theorem III\.1\.
PE\-means as defined in Algorithm[1](https://arxiv.org/html/2606.00342#alg1)satisfies\(ϵ,δ\)\(\\epsilon,\\delta\)\-DP\.
###### Proof\.
We break the proof into the same steps as the original paper\[[22](https://arxiv.org/html/2606.00342#bib.bib2)\]:
- •Step 1: Bounding the sensitivity \(Definition[II\.2](https://arxiv.org/html/2606.00342#S2.Thmdefinition2)\) of the DP Nearest Neighbors Histogram in Algorithm[2](https://arxiv.org/html/2606.00342#alg2)\.Each private sample only contributes one vote\. If we add or remove one sample, the resulting histogram will change by 1 in theℓ2\\ell\_\{2\}norm\. Therefore, the sensitivity is 1\.
- •Step 2: Viewing each PE iteration as a Gaussian mechanism\.In Line[9](https://arxiv.org/html/2606.00342#alg2.l9)of Algorithm[2](https://arxiv.org/html/2606.00342#alg2), we add i\.i\.d\. Gaussian noise with standard deviationσ\\sigmato each bin\. This is the only part of the algorithm that touches the private dataset\.
- •Step 3: Viewing the entire PE algorithm asTTadaptive compositions of Gaussian mechanisms\.This holds because PE simply applies[Algorithm2](https://arxiv.org/html/2606.00342#alg2)TTtimes sequentially\.
- •Step 4: Viewing the entire PE algorithm as one Gaussian mechanism with noise multiplierσ/T\\sigma/\\sqrt\{T\}\.Using the composition theorem \(Theorem[II\.2](https://arxiv.org/html/2606.00342#S2.Thmtheorem2)\) from Dong et al\.\[[6](https://arxiv.org/html/2606.00342#bib.bib17), Corollary 2\], we get thatTTapplications of a1/σ1/\\sigma\-GDP algorithm areT/σ\\sqrt\{T\}/\\sigma\-GDP\.
- •Step 5: Computing DP parametersϵ\\epsilonandδ\\delta\.The problem is simply computing aσ\\sigmasuch that\(ϵ,δ\)\(\\epsilon,\\delta\)\-DP iffT/σ\\sqrt\{T\}/\\sigma\-GDP, for which we apply the algorithm of Balle and Wang\[[2](https://arxiv.org/html/2606.00342#bib.bib18)\]\.
∎
Our high\-dimensional extension of PE includes additional use of the private data in the projection, and thus we state and prove its privacy\.
###### Theorem III\.2\.
HDPE\-means as defined in Algorithm[3](https://arxiv.org/html/2606.00342#alg3)satisfies\(ϵ,δ\)\(\\epsilon,\\delta\)\-DP\.
###### Proof\.
- •Step 1: Composition accounting\.We first assume that PE is called with two fewer iterations than theσ\\sigmawas computed for in Line[6](https://arxiv.org/html/2606.00342#alg3.l6)\. Applying Theorem[III\.1](https://arxiv.org/html/2606.00342#S3.Thmtheorem1)we get that the run of PE is private with two extra applications of a Gaussian mechanism with standard deviation ofσ\\sigmaremaining\.
- •Step 2: Assignment as post\-processing\.After calling PE, the assignment step \(Line[7](https://arxiv.org/html/2606.00342#alg3.l7)to[9](https://arxiv.org/html/2606.00342#alg3.l9)\) uses the published centroids from the previous iteration \(post\-processing\) to divide the dataset into clusters\. We can then apply parallel composition over each of the clusters\. Thus, we can focus on the privacy cost of a single cluster for the remainder of the proof\.
- •Step 3: Privacy of the Gaussian mechanisms\.For each cluster, we apply the Gaussian mechanism twice\. The first is to compute a sum which has anℓ2\\ell\_\{2\}sensitivity ofRRsince adding or removing a point can at most add the largest possible vector in the domain\. The second is to compute the count, which has anℓ2\\ell\_\{2\}sensitivity of11, as a data point can be counted at most once\. Applying Theorem[II\.1](https://arxiv.org/html/2606.00342#S2.Thmtheorem1), the result follows\.
∎
## IVExperiments
### IV\-AExperimental Setup
#### Datasets
We use a combination of real and synthetic datasets from various sources\. Most of the real datasets as well as the G2 datasets come from the clustering datasets repository\[[14](https://arxiv.org/html/2606.00342#bib.bib10)\]or the UCI machine learning repository\[[7](https://arxiv.org/html/2606.00342#bib.bib15)\]\. For Birch2\[[34](https://arxiv.org/html/2606.00342#bib.bib12)\], we take25,00025,000random samples from the dataset of100,000100,000\. Gas\[[7](https://arxiv.org/html/2606.00342#bib.bib15)\], Letter\[[7](https://arxiv.org/html/2606.00342#bib.bib15)\], and MNIST\[[20](https://arxiv.org/html/2606.00342#bib.bib16)\]are included for comparability with Chang and Kamath\[[4](https://arxiv.org/html/2606.00342#bib.bib6)\]\. For MNIST, we train a LeNet5 model and use neural representations following Chang and Kamath\[[4](https://arxiv.org/html/2606.00342#bib.bib6)\]\. We use some synthetic datasets generated by Diaa et al\.\[[5](https://arxiv.org/html/2606.00342#bib.bib9)\]that were generated with theclusterGenerationR package\[[29](https://arxiv.org/html/2606.00342#bib.bib11)\]\. Finally, we include synthetic datasets generated with themake\_blobsfunction from Sklearn that generate isotropic Gaussian clusters\. TableLABEL:tab:auc\_resultsincludes a complete list of datasets and their sizes\.
#### Baselines
Our baselines are selected from state\-of\-the\-art DPkk\-means algorithms with publicly available implementations\. The work of Chang and Kamath \(which we denote byGoogle\) is perhaps the most popular approach\[[4](https://arxiv.org/html/2606.00342#bib.bib6)\]\. Chang and Kamath use a locality\-sensitive hash tree to create a private coreset, upon which the non\-privatekk\-means algorithm is applied\. Because Google’s method is one of the most recent works, we also evaluate the approaches it compares to\. The first is IBM’s differential privacy library\[[17](https://arxiv.org/html/2606.00342#bib.bib8)\]\(which we denote byDP\-Lib\) that implements an improved version of Su et al\.’s algorithm\[[31](https://arxiv.org/html/2606.00342#bib.bib5)\]\. Su et al\. apply noise to each step of Lloyd’s clustering algorithm\[[23](https://arxiv.org/html/2606.00342#bib.bib33)\]with various optimizations such as the sphere packing initialization\. The second is the work of Balcan et al\. \(which we denote byIcml17\) that first projects the data to a low\-dimensional space before recursively dividing the space into cubes\. They then apply akk\-medians\-style swapping algorithm to choose the best centres before projecting the result back into the high\-dimensional space through noisy averaging\. We note that both DP\-Lib and Icml17 work in the pure differential privacy model \(δ=0\\delta=0\)\. The final related work is a recent improvement over Su et al\.’s algorithm by Diaa et al\.\[[5](https://arxiv.org/html/2606.00342#bib.bib9)\], which we denote byFastLloyd\. FastLloyd modifies Su et al\.’s work by using Gaussian noise and computing cluster updates relative to the previous iteration, thereby reducing the sensitivity\.
#### Implementation Details
We normalize all datasets to have a maxℓ2\\ell\_\{2\}norm of11, for a clean presentation of metrics over different datasets\. We follow Chang and Kamath and centre each dataset first by subtracting the mean\[[4](https://arxiv.org/html/2606.00342#bib.bib6)\]\. Then, also following Chang and Kamath, we non\-privately compute theℓ2\\ell\_\{2\}andℓ∞\\ell\_\{\\infty\}sensitivity bounds on each dataset\. In practice, this bound should be computed privately or derived from public information about the attributes\. However, for the sake of comparison, we give all approaches an equal advantage by providing a tight bound on the domain\. Each experiment is repeated5050times over different random seeds, and we report the average\. All shaded areas represent the 95% confidence interval of the mean of the results\. We fix the privacy parameterδ=1/N1\.1\\delta=1/N^\{1\.1\}, following the recommendation that the failure probability should be less than1/N1/N\[[10](https://arxiv.org/html/2606.00342#bib.bib14)\]\. The primary loss we consider is the normalizedkk\-means loss\.
Loss=1N∑i=1Nminj=1k‖xi−μj‖2\\text\{Loss\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\min\_\{j=1\}^\{k\}\\\|x\_\{i\}\-\\mu\_\{j\}\\\|^\{2\}\(6\)We also compute each method’s performance over different privacy budgets using the area under the curve \(AUC\) of the loss values againstϵ∈\{0\.25,0\.5,1\.0,2\.0,4\.0\}\\epsilon\\in\\\{0\.25,0\.5,1\.0,2\.0,4\.0\\\}, computed via the trapezoidal rule:
AUC=∑i=1n−1Lossi\+Lossi\+12⋅\(ϵi\+1−ϵi\)\\text\{AUC\}=\\sum\_\{i=1\}^\{n\-1\}\\frac\{\\text\{Loss\}\_\{i\}\+\\text\{Loss\}\_\{i\+1\}\}\{2\}\\cdot\(\\epsilon\_\{i\+1\}\-\\epsilon\_\{i\}\)\(7\)
### IV\-BVote Splitting Example
In Section[III\-B](https://arxiv.org/html/2606.00342#S3.SS2), we described the issue of vote splitting in PE’s selection\. In this section, we illustrate the phenomenon with a toy example\. We generate a two\-dimensional random dataset withk=4k=4using themake\_blobsfunction and setϵ=∞\\epsilon=\\infty\. In Figure[1](https://arxiv.org/html/2606.00342#S4.F1), we plot four iterations of PE \(one per column\) and show the difference between the previous top\-kkvoting \(top row of plots\) and our weightedkk\-means selection \(bottom row of plots\)\. The plots include the private data in grey, the population of PE in blue, and the selected points for each method in red\.
In the case of the top\-kkvoting \(first row\), we observe that the initial iteration chooses points close to the four true clusters in the dataset\. However, in the second iteration, there are many good candidates in the lower cluster, splitting the vote and causing all selected points to be in the other three clusters\. Then, in the third iteration, the lower cluster dies out and the vote becomes split in the upper cluster\. The clusters do not recover in the fourth iteration, as it will now take PE numerous iterations to move back towards those areas\. In contrast, the bottom row of plots shows PE maintaining and refining four healthy clusters throughout the iterations\. We note that in this toy example, PE has essentially converged in the first iteration, which further highlights the challenge with top\-kkvoting\.
Figure 1:A visualization of the samples generated and selected by PE in each iteration, comparing the previous top\-kkselection approach to our proposedkk\-means approach withϵ=∞\\epsilon=\\infty\. Thekk\-means approach effectively mitigates the vote splitting problem and keeps selected points close to the true clusters\.
### IV\-CHyperparameter Tuning
Two of the most influential hyperparameters for PE\-means are the number of iterationsTTand the number of variationsLL\. For a fixed privacy budget, if the number of iterations is too high, the amount of noise used in each iteration grows too large, preventing meaningful selections\. If the algorithm executes too few iterations, it will not converge\. Similarly, ifLLis too large, the histogram can become sparse and the votes will not dominate even the smallest amount of noise\. IfLLis too small, then the chance of PE\-means finding points closer to the private data is reduced, slowing convergence\. We recall that our adaptive reduction ofLL\(Line[12](https://arxiv.org/html/2606.00342#alg1.l12)of Algorithm[1](https://arxiv.org/html/2606.00342#alg1)\) protects against the case whenLLis causing the noise to dominate the signal, but it is still beneficial to set a good starting point\.
The only public parameters we can use to estimate the difficulty of a dataset in the private setting are its dimensions\. In our initial testing, we observed that the number of iterations is influenced by the dimension of the dataset\. Namely, in higher\-dimensional space, PE needs more steps due to the curse of dimensionality\. We also observed that the sparsity of the histogram for a givenLLis highly influenced by the number of values in a dataset that can vote\. To investigate this more rigorously, we conduct an experiment where we varyLLandTTand plot the values against the dataset size and dimension\. We plot the results in Figure[2](https://arxiv.org/html/2606.00342#S4.F2), whereϵ=1\\epsilon=1\. We scatter the values of the top\-33best performing parameters for each dataset and draw a line of best fit using the LOWESS method\. While the top\-left plot reveals no useful relationship between the number of variationsLLand the number of dimensionsdd, the top\-right plot shows a clear relationship between the dataset sizeNNandLL\. We approximate this relationship with the heuristicL=max\(N/5,4\)L=max\(N/5,4\)\. Similarly, we see a clear relationship between the dimensionddand number of iterationsTTin the bottom\-left plot, which we approximate with the heuristicT=max\(4d,1\)T=max\(4\\sqrt\{d\},1\)\. Finally, we see no clear relationship between theNNandTTin the bottom\-right plot\. In our experiments, we never encounter the constant cases in the max terms, but include them to ensure a robust implementation\.
To extend these heuristics to also account for the privacy budgetϵ\\epsilon, we experimented with scaling them by factors such asϵ,0\.5ϵ,2ϵ,ϵ\\epsilon,0\.5\\epsilon,2\\epsilon,\\sqrt\{\\epsilon\}\(since both parameters should decrease with smaller epsilon\)\. We find that for values ofϵ<1\\epsilon<1, no modification improves over the baseline, as our adaptive strategy increases the signal\-to\-noise ratio for any dataset that needs it \(and not all do\)\. However, whenϵ\>1\\epsilon\>1, we find it is best \(taking the median performance over all datasets\) to scale the number of iterations byϵ\\epsilon\(T=max\(4ϵd,1\)T=max\(4\\epsilon\\sqrt\{d\},1\)\)\.
Figure 2:A scatter plot of top\-33parameter configurations ofLLandTTfor each dataset, showing a clear relationship betweenLLandNNand betweenTTandddthat are captured by our heuristics\.
### IV\-DUtility Benchmark
We compare PE\-means and HDPE\-means to the baselines over all datasets and summarize the results in TableLABEL:tab:auc\_results\. The metric we use is the AUC of thekk\-means loss \(a summary of performance over a collection of epsilons, with smaller values being best\)\. We highlight the approach with the lowest loss AUC in green\. HDPE\-means is only included when the dimension of the dataset is greater than1616, as that is the dimensionality we project down to\. Overall, PE\-means and HDPE\-means perform the best over most datasets, with FastLloyd doing well on the g2\_4 dataset, and Icml17 doing well in the higher\-dimensional scale datasets\. We give the percentage of improvement \(or decline\) of the best PE\-based approach compared to the best non\-PE private approach in the last column\. We see that when PE\-based approaches are outperformed, it is by at most7%7\\%, but on average, they improve by20%20\\%, with improvements of as much as91%91\\%\. Finally, we note that HDPE\-means typically outperforms PE\-means on datasets with dimension larger than1616\. In high dimensional cases where PE\-means outperforms HDPE\-means, it is not by a significant amount compared to the performance of related work, and thus we recommend always using HDPE\-means ind\>16d\>16datasets\.
Figure 3:A plot of the privacy\-utility trade\-off of all approaches on real datasets, showing a PE\-based approach always performs best\.We give the full privacy vs\. utility trade\-off for the various real datasets \(including the three evaluated by Chang and Kamath\[[4](https://arxiv.org/html/2606.00342#bib.bib6)\]\) in Figure[3](https://arxiv.org/html/2606.00342#S4.F3)\. We observe that a PE\-based approach is always the best approach, being the closest to the non\-private baseline in all plots\. Among the baselines, we observe Icml17 performs well in all sets, whereas FastLloyd performs well in the lower dimensions \(birch2 and iris\) and Google does well in higher dimensions\. Finally, in Figure[4](https://arxiv.org/html/2606.00342#S4.F4), we show how the algorithms scale with the number of clusters and a fixed privacy budget ofϵ=1\\epsilon=1\. The scale datasets follow the trend of the non\-private version until we increase dimensions, at which point the noise has more of an effect at higher numbers of clusters\. This is likely because the data size remains fixed, causing the number of samples voting \(or being aggregated\) per cluster to decline\. The scale datasets also highlight the room for improvement in PE\-based approaches in higher dimensions\. For the Sklearn sets, which tend to be more well\-separated Gaussian blobs, we observe PE\-based approaches outperforming all other approaches and also scaling better withkk\.
Figure 4:A comparison illustrating how approaches scale with the number of clusterskkover different synthetic datasets atϵ\\epsilon=1\.0\.
## VRelated Work
#### DPkk\-means
In addition to the baselines we compare against in Section[IV](https://arxiv.org/html/2606.00342#S4), there are several previous works that also solve the DPkk\-means clustering problem, which are either improved upon by our baselines or are more theoretical in nature\. One line of work focused on developing a private version of Lloyd’s algorithm\[[3](https://arxiv.org/html/2606.00342#bib.bib19),[25](https://arxiv.org/html/2606.00342#bib.bib20),[11](https://arxiv.org/html/2606.00342#bib.bib21),[31](https://arxiv.org/html/2606.00342#bib.bib5),[5](https://arxiv.org/html/2606.00342#bib.bib9)\]starting with Blum et al\.\[[3](https://arxiv.org/html/2606.00342#bib.bib19)\]and concluding with our baselines of Su et al\.\[[31](https://arxiv.org/html/2606.00342#bib.bib5)\]and FastLloyd\[[5](https://arxiv.org/html/2606.00342#bib.bib9)\]\. Another line of work used the sample and aggregate framework\[[27](https://arxiv.org/html/2606.00342#bib.bib22),[26](https://arxiv.org/html/2606.00342#bib.bib23)\]to solvekk\-means, but was outperformed by Su et al\.\[[31](https://arxiv.org/html/2606.00342#bib.bib5)\]\. A theoretical line of work focused on minimizing the bounds on approximation error, but did not provide experimental evaluation\[[12](https://arxiv.org/html/2606.00342#bib.bib24),[13](https://arxiv.org/html/2606.00342#bib.bib25),[28](https://arxiv.org/html/2606.00342#bib.bib26),[30](https://arxiv.org/html/2606.00342#bib.bib27),[15](https://arxiv.org/html/2606.00342#bib.bib28),[19](https://arxiv.org/html/2606.00342#bib.bib29)\]\.
#### DP Evolutionary Algorithms
There has been previous work that have applied evolutionary techniques to the problem of clustering\. The first differentially private evolutionary algorithm was developed by Zhang et al\.\[[33](https://arxiv.org/html/2606.00342#bib.bib30)\]and evaluated the problem ofkk\-means clustering\. Follow\-up work by Humphries and Kerschbaum\[[18](https://arxiv.org/html/2606.00342#bib.bib31)\]showed that Zhang et al\.’s solution suffered from prohibitively poor utility and provided an improved algorithm which was evaluated on thekk\-medians clustering problem\. While our work also applies an evolutionary\-based approach to solve clustering, our algorithm uses significantly less privacy budget\. The population of PE\-means is a set of centroids from which a single solution ofkkpoints is chosen in each iteration using a DP histogram with sensitivity11, following the previous work in PE\[[22](https://arxiv.org/html/2606.00342#bib.bib2),[32](https://arxiv.org/html/2606.00342#bib.bib3),[21](https://arxiv.org/html/2606.00342#bib.bib4)\]\. Humphries and Kerschbaum instead evolve a population where each candidate is a solution ofkkpoints and applies the exponential mechanism on the clustering loss \(with sensitivity proportional to the data domain\) to select multiple candidates in each iteration, using significantly more privacy budget\.
## VIConclusion
We have shown that the PE algorithm is well\-suited to the problem ofkk\-means clustering and have made various improvements to the PE algorithm of independent interest\. Our benchmark shows that PE\-means and HDPE\-means offer state\-of\-the\-art performance on many synthetic and real clustering datasets\. In future work, we will study mutation operators that more efficiently search high\-dimensional spaces\.
## Acknowledgements
We would like to thank Sivakanth Gopi for helpful discussions on an earlier version of this work, including the suggestion to use MLE post\-processing on the vote histogram\.
## References
- \[1\]M\. Balcan, T\. Dick, Y\. Liang, W\. Mou, and H\. Zhang\(2017\-06–11 Aug\)Differentially private clustering in high\-dimensional Euclidean spaces\.InProceedings of the 34th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.70,pp\. 322–331\.Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§I](https://arxiv.org/html/2606.00342#S1.p2.1),[§I](https://arxiv.org/html/2606.00342#S1.p4.2),[§III\-A](https://arxiv.org/html/2606.00342#S3.SS1.p1.9),[§III\-D](https://arxiv.org/html/2606.00342#S3.SS4.p1.2)\.
- \[2\]B\. Balle and Y\. Wang\(2018\-10–15 Jul\)Improving the Gaussian mechanism for differential privacy: analytical calibration and optimal denoising\.InProceedings of the 35th International Conference on Machine Learning,J\. Dy and A\. Krause \(Eds\.\),Proceedings of Machine Learning Research, Vol\.80,pp\. 394–403\.External Links:[Link](https://proceedings.mlr.press/v80/balle18a.html)Cited by:[§II\-B](https://arxiv.org/html/2606.00342#S2.SS2.p7.1),[Theorem II\.3](https://arxiv.org/html/2606.00342#S2.Thmtheorem3),[5th item](https://arxiv.org/html/2606.00342#S3.I1.i5.p1.5)\.
- \[3\]A\. Blum, C\. Dwork, F\. McSherry, and K\. Nissim\(2005\-06\-13\)Practical privacy: the SuLQ framework\.InProceedings of the twenty\-fourth ACM SIGMOD\-SIGACT\-SIGART symposium on Principles of database systems,PODS ’05,pp\. 128–138\.External Links:ISBN 978\-1\-59593\-062\-0,[Link](https://doi.org/10.1145/1065167.1065184),[Document](https://dx.doi.org/10.1145/1065167.1065184)Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[4\]A\. Chang and P\. Kamath\(2024\)Practical differentially private clustering\.External Links:[Link](https://research.google/blog/practical-differentially-private-clustering)Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§I](https://arxiv.org/html/2606.00342#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.00342#S3.SS1.p1.9),[§III\-D](https://arxiv.org/html/2606.00342#S3.SS4.p1.2),[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px1.p1.2),[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px2.p1.4),[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px3.p1.8),[§IV\-D](https://arxiv.org/html/2606.00342#S4.SS4.p2.2)\.
- \[5\]A\. Diaa, T\. Humphries, and F\. Kerschbaum\(2025\)\{\\\{fastlloyd\}\\\}: Federated, accurate, secure, and tunable\{\\\{k\-means\}\\\}clustering with differential privacy\.In34th USENIX Security Symposium \(USENIX Security 25\),pp\. 2733–2752\.Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§I](https://arxiv.org/html/2606.00342#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.00342#S3.SS1.p1.9),[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px1.p1.2),[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px2.p1.4),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[6\]J\. Dong, A\. Roth, and W\. J\. Su\(2022\-02\)Gaussian differential privacy\.Journal of the Royal Statistical Society Series B: Statistical Methodology84\(1\),pp\. 3–37\.External Links:ISSN 1369\-7412,[Document](https://dx.doi.org/10.1111/rssb.12454),[Link](https://doi.org/10.1111/rssb.12454)Cited by:[§II\-B](https://arxiv.org/html/2606.00342#S2.SS2.p3.2),[Definition II\.3](https://arxiv.org/html/2606.00342#S2.Thmdefinition3),[Definition II\.3](https://arxiv.org/html/2606.00342#S2.Thmdefinition3.p1.8.4),[Theorem II\.1](https://arxiv.org/html/2606.00342#S2.Thmtheorem1.p1.5.5),[Theorem II\.2](https://arxiv.org/html/2606.00342#S2.Thmtheorem2),[Theorem II\.3](https://arxiv.org/html/2606.00342#S2.Thmtheorem3),[4th item](https://arxiv.org/html/2606.00342#S3.I1.i4.p1.4)\.
- \[7\]D\. Dua and C\. Graff\(2021\)UCI machine learning repository\.University of California, Irvine, School of Information and Computer Sciences\.Note:[http://archive\.ics\.uci\.edu/ml](http://archive.ics.uci.edu/ml)Cited by:[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px1.p1.2)\.
- \[8\]J\. Duchi, S\. Shalev\-Shwartz, Y\. Singer, and T\. Chandra\(2008\)Efficient projections onto the l1\-ball for learning in high dimensions\.InProceedings of the 25th International Conference on Machine Learning,ICML ’08,New York, NY, USA,pp\. 272–279\.External Links:ISBN 9781605582054,[Link](https://doi.org/10.1145/1390156.1390191),[Document](https://dx.doi.org/10.1145/1390156.1390191)Cited by:[§III\-B](https://arxiv.org/html/2606.00342#S3.SS2.p1.2)\.
- \[9\]C\. Dwork, F\. McSherry, K\. Nissim, and A\. Smith\(2006\)Calibrating noise to sensitivity in private data analysis\.InTheory of Cryptography,pp\. 265–284\.Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§II\-B](https://arxiv.org/html/2606.00342#S2.SS2.p1.1)\.
- \[10\]C\. Dwork and A\. Roth\(2014\)The algorithmic foundations of differential privacy\.Foundations and Trends in Theoretical Computer Science9\(3\-4\),pp\. 211–407\.External Links:[Document](https://dx.doi.org/10.1561/0400000042)Cited by:[§II\-B](https://arxiv.org/html/2606.00342#S2.SS2.p2.3),[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px3.p1.8)\.
- \[11\]C\. Dwork\(2011\)A firm foundation for private data analysis\.Communications of the ACM54\(1\),pp\. 86–95\.Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[12\]D\. Feldman, A\. Fiat, H\. Kaplan, and K\. Nissim\(2009\-05\-31\)Private coresets\.InProceedings of the forty\-first annual ACM symposium on Theory of computing,STOC ’09,pp\. 361–370\.External Links:ISBN 978\-1\-60558\-506\-2,[Link](https://doi.org/10.1145/1536414.1536465),[Document](https://dx.doi.org/10.1145/1536414.1536465)Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[13\]D\. Feldman, C\. Xiang, R\. Zhu, and D\. Rus\(2017\-04\-18\)Coresets for differentially private k\-means clustering and applications to privacy in mobile sensor networks\.InProceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks,pp\. 3–15\.External Links:ISBN 978\-1\-4503\-4890\-4,[Link](https://dl.acm.org/doi/10.1145/3055031.3055090),[Document](https://dx.doi.org/10.1145/3055031.3055090)Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[14\]P\. Fränti and S\. Sieranoja\(2018\)K\-means properties on six clustering benchmark datasets\.Vol\.48\.External Links:[Link](http://cs.uef.fi/sipu/datasets/)Cited by:[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px1.p1.2)\.
- \[15\]B\. Ghazi, R\. Kumar, and P\. Manurangsi\(2020\)Differentially private clustering: tight approximation ratios\.InAdvances in Neural Information Processing Systems,Vol\.33,pp\. 4040–4054\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2020/hash/299dc35e747eb77177d9cea10a802da2-Abstract.html)Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[16\]A\. Ghosal, A\. Nandy, A\. K\. Das, S\. Goswami, and M\. Panday\(2020\)A short review on different clustering techniques and their applications\.Emerging Technology in Modelling and Graphics: Proceedings of IEM Graph 2018,pp\. 69–83\.Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6)\.
- \[17\]N\. Holohan, S\. Braghin, P\. Mac Aonghusa, and K\. Levacher\(2019\-07\)Diffprivlib: the IBM differential privacy library\.ArXiv e\-prints1907\.02444 \[cs\.CR\]\.Cited by:[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px2.p1.4)\.
- \[18\]T\. Humphries and F\. Kerschbaum\(2023\)Differentially private simple genetic algorithms\.InProceedings on Privacy Enhancing Technologies \(PoPETs\),pp\. 540–558\.External Links:[Document](https://dx.doi.org/10.56553/popets-2023-0124)Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px2.p1.5)\.
- \[19\]M\. Jones, H\. L\. Nguyen, and T\. D\. Nguyen\(2021\-05\)Differentially private clustering via maximum coverage\.Proceedings of the AAAI Conference on Artificial Intelligence35\(13\),pp\. 11555–11563\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/17375)Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[20\]Y\. LeCun\(1998\)The mnist database of handwritten digits\.http://yann\. lecun\. com/exdb/mnist/\.Cited by:[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px1.p1.2)\.
- \[21\]Z\. Lin, T\. Baltrusaitis, and S\. Yekhanin\(2025\)Differentially private synthetic data via APIs 3: using simulators instead of foundation model\.InICLR 2025 Workshop on Navigating and Addressing Data Problems for Foundation Models,Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p3.1),[§III](https://arxiv.org/html/2606.00342#S3.p1.1),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px2.p1.5)\.
- \[22\]Z\. Lin, S\. Gopi, J\. Kulkarni, H\. Nori, and S\. Yekhanin\(2024\)Differentially private synthetic data via foundation model APIs 1: images\.InThe Twelfth International Conference on Learning Representations,Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p3.1),[§III\-E](https://arxiv.org/html/2606.00342#S3.SS5.1.p1.1),[§III\-E](https://arxiv.org/html/2606.00342#S3.SS5.p1.1),[§III](https://arxiv.org/html/2606.00342#S3.p1.1),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px2.p1.5),[Algorithm 1](https://arxiv.org/html/2606.00342#alg1)\.
- \[23\]S\. Lloyd\(1982\-03\)Least squares quantization in PCM\.IEEE Transactions on Information Theory28\(2\),pp\. 129–137\.External Links:[Document](https://dx.doi.org/10.1109/tit.1982.1056489),[Link](https://doi.org/10.1109/tit.1982.1056489)Cited by:[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px2.p1.4)\.
- \[24\]R\. N\. Mantegna\(1994\-05\)Fast, accurate algorithm for numerical simulation of lévy stable stochastic processes\.Phys\. Rev\. E49,pp\. 4677–4683\.External Links:[Document](https://dx.doi.org/10.1103/PhysRevE.49.4677),[Link](https://link.aps.org/doi/10.1103/PhysRevE.49.4677)Cited by:[§III\-C](https://arxiv.org/html/2606.00342#S3.SS3.p1.3)\.
- \[25\]F\. D\. McSherry\(2009\)Privacy integrated queries: an extensible platform for privacy\-preserving data analysis\.InProceedings of the 2009 ACM SIGMOD International Conference on Management of Data,SIGMOD ’09,New York, NY, USA,pp\. 19–30\.External Links:ISBN 9781605585512,[Link](https://doi.org/10.1145/1559845.1559850),[Document](https://dx.doi.org/10.1145/1559845.1559850)Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[26\]P\. Mohan, A\. Thakurta, E\. Shi, D\. Song, and D\. Culler\(2012\)GUPT: privacy preserving data analysis made easy\.InProceedings of the 2012 ACM SIGMOD International Conference on Management of Data,pp\. 349–360\.Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[27\]K\. Nissim, S\. Raskhodnikova, and A\. Smith\(2007\)Smooth sensitivity and sampling in private data analysis\.InProceedings of the Thirty\-Ninth Annual ACM Symposium on Theory of Computing,STOC ’07,New York, NY, USA,pp\. 75–84\.External Links:ISBN 9781595936318,[Link](https://doi.org/10.1145/1250790.1250803),[Document](https://dx.doi.org/10.1145/1250790.1250803)Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[28\]K\. Nissim and U\. Stemmer\(2018\-04\-09\)Clustering algorithms for the centralized and local models\.InProceedings of Algorithmic Learning Theory,pp\. 619–653\.External Links:[Link](https://proceedings.mlr.press/v83/nissim18a.html)Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[29\]W\. Qiu and H\. Joe\(2023\)Random cluster generation \(with specified degree of separation\)\.Note:R package version 1\.3\.8External Links:[Link](https://cran.r-project.org/package=clusterGeneration)Cited by:[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px1.p1.2)\.
- \[30\]U\. Stemmer and H\. Kaplan\(2018\)Differentially private k\-means with constant multiplicative error\.InAdvances in Neural Information Processing Systems,Vol\.31\.External Links:[Link](https://papers.nips.cc/paper_files/paper/2018/hash/32b991e5d77ad140559ffb95522992d0-Abstract.html)Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[31\]D\. Su, J\. Cao, N\. Li, E\. Bertino, and H\. Jin\(2016\)Differentially private k\-means clustering\.InProceedings of the Sixth ACM Conference on Data and Application Security and Privacy,CODASPY ’16,pp\. 26–37\.External Links:ISBN 9781450339353,[Link](https://doi.org/10.1145/2857705.2857708),[Document](https://dx.doi.org/10.1145/2857705.2857708)Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p1.6),[§I](https://arxiv.org/html/2606.00342#S1.p2.1),[§III\-A](https://arxiv.org/html/2606.00342#S3.SS1.p1.9),[§III\-D](https://arxiv.org/html/2606.00342#S3.SS4.p1.2),[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px2.p1.4),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px1.p1.2)\.
- \[32\]C\. Xie, Z\. Lin, A\. Backurs, S\. Gopi, D\. Yu, H\. A\. Inan, H\. Nori, H\. Jiang, H\. Zhang, Y\. T\. Lee, B\. Li, and S\. Yekhanin\(2024\-21–27 Jul\)Differentially private synthetic data via foundation model APIs 2: text\.InProceedings of the 41st International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.235,pp\. 54531–54560\.Cited by:[§I](https://arxiv.org/html/2606.00342#S1.p3.1),[§III\-C](https://arxiv.org/html/2606.00342#S3.SS3.p1.3),[§III](https://arxiv.org/html/2606.00342#S3.p1.1),[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px2.p1.5)\.
- \[33\]J\. Zhang, X\. Xiao, Y\. Yang, Z\. Zhang, and M\. Winslett\(2013\)PrivGene: differentially private model fitting using genetic algorithms\.New York, NY, USA\.External Links:ISBN 9781450320375,[Link](https://doi.org/10.1145/2463676.2465330),[Document](https://dx.doi.org/10.1145/2463676.2465330)Cited by:[§V](https://arxiv.org/html/2606.00342#S5.SS0.SSS0.Px2.p1.5)\.
- \[34\]T\. Zhang, R\. Ramakrishnan, and M\. Livny\(1997\)BIRCH: a new data clustering algorithm and its applications\.Data Mining and Knowledge Discovery1\(2\),pp\. 141–182\.Cited by:[§IV\-A](https://arxiv.org/html/2606.00342#S4.SS1.SSS0.Px1.p1.2)\.Similar Articles
Private Adaptive Covariance Estimation via Gaussian Graphical Models
This paper introduces PACE-GGM, a differentially private method for covariance estimation that adaptively selects and measures the most informative entries of the empirical covariance matrix, using Gaussian graphical models for reconstruction. It shows improved estimation error over baselines on real-world data, especially in high-dimensional settings.
DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum
DP-MacAdam combines adaptive clipping and adaptive momentum to improve differentially private SGD, achieving better model utility without manual tuning of the clipping threshold.
Semi-supervised knowledge transfer for deep learning from private training data
OpenAI presents PATE (Private Aggregation of Teacher Ensembles), a privacy-preserving approach that trains a student model on noisy outputs from multiple teacher models trained on disjoint datasets, providing strong differential privacy guarantees without exposing sensitive training data.
DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
DEI introduces a distributed Quality-Diversity search framework using heterogeneous LLMs as mutation operators, showing that model diversity improves performance over homogeneous parallel approaches. Evaluated on the Core War domain, a four-node heterogeneous ensemble achieves significant gains in QD-Score and coverage.
Parallel Adaptive Multi-Objective Evolutionary Learning of Discretized Bayesian Network Classifiers for Clinical Data
This paper introduces a parallelization strategy and adaptive steering mechanism for the Baymex algorithm to efficiently learn discretized Bayesian network classifiers for clinical data, achieving speedups over 54x on a 16-core CPU and comparable or better predictive performance than traditional models while maintaining explainability.