FedUP: One-Shot Federated Unlearning via Centroid-Guided Plug-in Filters

arXiv cs.LG Papers

Summary

FedUP proposes a one-shot federated unlearning framework that uses lightweight, pluggable filters guided by differentially private class centroids to efficiently remove specific knowledge without multi-round communication, achieving low latency and inherent reversibility.

arXiv:2606.24113v1 Announce Type: new Abstract: Federated unlearning (FU) is critical for complying with legal mandates like the right to be forgotten in decentralized systems, yet current methods face a persistent dilemma between non-target knowledge loss and high request latency. To resolve these issues, we propose FedUP, a one-shot federated unlearning framework utilizing lightweight pluggable filters that act as a "knowledge funnel" to screen out target data while preserving original model performance. By freezing original model parameters and training filters at the server side using differentially private (DP)-protected class centroid samples, FedUP bypasses the need for multi-round client-server communication and complex retraining, reducing unlearning latency from minutes to mere seconds. Additionally, the framework's pluggable architecture ensures inherent reversibility, enabling the seamless restoration of forgotten knowledge by simply removing the filters. Extensive experiments on diverse image and text tasks demonstrate that FedUP effectively reduces non-target knowledge loss and achieves superior unlearning precision and efficiency across various scenarios. Code is available at: https://github.com/suows/FedUP-code.
Original Article
View Cached Full Text

Cached at: 06/24/26, 07:50 AM

# FedUP: One-Shot Federated Unlearning via Centroid-Guided Plug-in Filters
Source: [https://arxiv.org/html/2606.24113](https://arxiv.org/html/2606.24113)
Zhengyi Zhong1Pan Wang1Weidong Bao1Xiongtao Zhang1 Quan Wen1&Ji Wang1 1National Key Laboratory of Big Data and Decision, National University of Defense Technology, China\. feihongnan178@outlook\.com, \{zhongzhengyi20, wangpan19, wdbao, zhangxiongtao14, wangji\}@nudt\.edu\.cn, weixingw1@sina\.comCorresponding Author

###### Abstract

Federated unlearning \(FU\) is critical for complying with legal mandates like the right to be forgotten in decentralized systems, yet current methods face a persistent dilemma betweennon\-target knowledge lossandhigh request latency\. To resolve these issues, we propose FedUP, a one\-shot federated unlearning framework utilizing lightweight pluggable filters that act as a “knowledge funnel” to screen out target data while preserving original model performance\. By freezing original model parameters and training filters at the server side using differentially private \(DP\)\-protected class centroid samples, FedUP bypasses the need for multi\-round client\-server communication and complex retraining, reducing unlearning latency from minutes to mere seconds\. Additionally, the framework’s pluggable architecture ensures inherent reversibility, enabling the seamless restoration of forgotten knowledge by simply removing the filters\. Extensive experiments on diverse image and text tasks demonstrate that FedUP effectively reduces non\-target knowledge loss and achieves superior unlearning precision and efficiency across various scenarios\. Code is available at:[https://github\.com/suows/FedUP\-code](https://github.com/suows/FedUP-code)\.

## 1Introduction

Background\. As a distributed machine learning paradigm, federated learning \(FL\)McMahanet al\.\([2016](https://arxiv.org/html/2606.24113#bib.bib3)\); Truonget al\.\([2021](https://arxiv.org/html/2606.24113#bib.bib67)\); Zhonget al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib82)\); Qiet al\.\([2024](https://arxiv.org/html/2606.24113#bib.bib83)\); Zhonget al\.\([2025a](https://arxiv.org/html/2606.24113#bib.bib25)\); Fuet al\.\([2025b](https://arxiv.org/html/2606.24113#bib.bib84)\); Jianget al\.\([2026](https://arxiv.org/html/2606.24113#bib.bib81)\)has gained significant attention in privacy\-sensitive scenarios recently because it eliminates the need for centralizing raw data during training\. However, during the training process, the global model internalizes client information into its parameters through multiple rounds of parameter aggregation, which exposes novel privacy risks when the model is confronted with legal mandates such as the right to be forgotten under GDPRde la Torre \([2018](https://arxiv.org/html/2606.24113#bib.bib2)\)\. To address this, researchers have focused on Federated Unlearning \(FU\), aiming to remove specific knowledge without retraining the global model from scratch\. Existing FU methods generally follow two main technical paradigms: server\-side FU methodsWuet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib4)\); Huynhet al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib5)\); Panet al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib6)\), which implement approximate unlearningYanget al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib11)\)on the global model at the server side; and client\-side FU methodsWanget al\.\([2023b](https://arxiv.org/html/2606.24113#bib.bib7)\); Liuet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib8)\); Zhuet al\.\([2023](https://arxiv.org/html/2606.24113#bib.bib9)\); Zhonget al\.\([2025b](https://arxiv.org/html/2606.24113#bib.bib10)\), which achieve exact unlearningKuoet al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib12)\)through iterative client\-side model retraining procedures\.

![Refer to caption](https://arxiv.org/html/2606.24113v1/x1.png)\(a\)Non\-target knowledge loss of server\-side FU method\.
![Refer to caption](https://arxiv.org/html/2606.24113v1/x2.png)\(b\)Unlearning request latency of client\-side FU method\.

Figure 1:FedEraser, a representative server\-side method, shows noticeable non\-target knowledge loss after unlearning across multiple tasks\. Client\-side methods like FUSED experience significantly longer convergence times as task difficulty increases, with responses up to 5 minutes indicating severe unlearning request latency\.Existing Challenges\. While current Federated Unlearning \(FU\) methods facilitate knowledge removal to various extents, challenges regarding non\-target knowledge loss and unlearning request latency still remain\. As depicted in Figure[1](https://arxiv.org/html/2606.24113#S1.F1), server\-side FU methods lack precise control over the unlearning scope, frequently incurringnon\-target knowledge loss, where the model performance unrelated to the unlearning request is inevitably impaired during knowledge removal\. Conversely, although client\-side FU methods achieve exact unlearning via retraining\-based approaches, they are heavily constrained by multi\-round training and communication, leading tohigh request latency\. To date, it is hard to find a solution that simultaneously ensures rapid response and prevents non\-target knowledge loss\. Moreover, both server\-side and client\-side FU paradigms typically rely on direct modification of original model parameters, which leads tothe irreversibility of unlearning\. Once the unlearning process is finalized, restoring previously removed knowledge necessitates further parameter tuning or retraining, thereby imposing complexity and additional overhead on practical deployment\.

Proposed Solution\. To this end, we propose FedUP, a one\-shot federated unlearning framework based on lightweight pluggable filters\. In terms of mitigating non\-target knowledge loss, the framework avoids performing direct, large\-scale parameter updates on the global model; instead, it implements unlearning by introducing independent pluggable filters while keeping original model parameters frozen \(shown in Figure[2](https://arxiv.org/html/2606.24113#S1.F2)\)\. These filters serve as a “knowledge funnel” that screens out target knowledge and permits only non\-target knowledge to pass, thereby reducing the interference of unlearning on non\-target knowledge\. To lower unlearning request latency, FedUP only needs to perform a few rounds of fine\-tuning on the lightweight filters at the server side using differential privacy \(DP\)\-protected class centroid samples to complete the unlearning task\. This bypasses multi\-round retraining and frequent communication with clients, significantly accelerating the response speed\. Moreover, since the filters are pluggable, the framework inherently supports reversibility\. As depicted in Figure[2](https://arxiv.org/html/2606.24113#S1.F2), when restoration of forgotten knowledge is required, it can be achieved simply by removing the filters\. Overall, FedUP provides a solution that balances precision, efficiency, and recoverability\.

![Refer to caption](https://arxiv.org/html/2606.24113v1/fig/filter.png)Figure 2:Lightweight plug\-in filters\.Contributions\. The main contributions are as follows:

- •We design FedUP, a one\-shot FU framework utilizing DP\-protected class centroids, mitigating non\-target knowledge loss and reducing unlearning request latency from minutes to seconds\.
- •We propose a reversible unlearning mechanism via lightweight pluggable filters without altering original model parameters while ensuring rapid transitions between unlearned and pre\-unlearning states\.
- •We conduct differential privacy analysis showing that appropriate noise protects class centroid samples while minimally impacting data utility\. Extensive experiments on image and text tasks confirm FedUP’s effectiveness across various scenarios\.

## 2Related Work

Machine Unlearning\. Simply removing training data from storage fails to purge its influence on deployed models\. Machine Unlearning \(MU\)Maet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib72)\)is thus introduced to erase such learned knowledge\. Existing methods are categorized as either exactCaoet al\.\([2018](https://arxiv.org/html/2606.24113#bib.bib13)\)or approximate unlearningGolatkaret al\.\([2020](https://arxiv.org/html/2606.24113#bib.bib15)\)\. Exact unlearning mandates that the post\-forgetting model be statistically indistinguishable from one retrained from scratch without the deleted data\. It retains the guarantees of complete retraining but lowers its cost via algorithmic shortcuts\. For classical models, the training process can be expressed in an invertible additive form, enabling point removal by subtracting its closed\-form termCaoet al\.\([2018](https://arxiv.org/html/2606.24113#bib.bib13)\)\. For complex architectures, exact unlearning employs data shardingBourtouleet al\.\([2021](https://arxiv.org/html/2606.24113#bib.bib17)\), localized retrainingChenet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib18)\), or intermediate checkpointingWanget al\.\([2023a](https://arxiv.org/html/2606.24113#bib.bib19)\)to reduce computation while preserving retraining\-level guarantees\. Approximate unlearning, in contrast, substitutes full retraining with lightweight fine\-tuning, permitting a bounded, residual influence from the deleted data\. Related work is typically grouped into data\-driven and model\-driven approachesNguyenet al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib77)\)\. Data\-driven methods involve relabeling retained samplesGraveset al\.\([2021](https://arxiv.org/html/2606.24113#bib.bib20)\)or partitioning data shardsGuptaet al\.\([2021](https://arxiv.org/html/2606.24113#bib.bib21)\)before fine\-tuning\. Model\-driven approaches adjust parameters directly via influence functionsGuoet al\.\([2019](https://arxiv.org/html/2606.24113#bib.bib22)\), fisher\-based regularizationGolatkaret al\.\([2020](https://arxiv.org/html/2606.24113#bib.bib15)\), or knowledge distillationKurmanjiet al\.\([2023](https://arxiv.org/html/2606.24113#bib.bib14)\), countering the gradient contributions of the data to be erased\.

Federated Unlearning\.Though federated learning maintains client privacy through local retention, knowledge from distributed datasets persists in the aggregated global model\. This requires integrating MU techniques into FL, named federated unlearning \(FU\)Wanget al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib23)\)\. Based on the execution location, FU methods can be categorized into server\-side and client\-side methodsLiuet al\.\([2024](https://arxiv.org/html/2606.24113#bib.bib75)\); Liet al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib53)\)\. Server\-side methods are intrinsically based on approximate unlearning to expunge client contributions without client involvement\. FedEraserLiuet al\.\([2021](https://arxiv.org/html/2606.24113#bib.bib24)\)calibrates update trajectories but still requires auxiliary retraining\. Conversely, Wu et al\.Wuet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib4)\)bypass client\-side computation by subtracting historical updates and employing knowledge distillation\. To optimize efficiency, Huynh et al\.Huynhet al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib5)\)utilize selective retention of influential updates to reduce memory overhead, while Pan et al\.Panet al\.\([2025](https://arxiv.org/html/2606.24113#bib.bib6)\)resolve parameter conflicts via Orthogonal Steepest Descent to accelerate the unlearning process\. Client\-side FU methods perform unlearning locally, striving to achieve exact unlearning while balancing computational efficiency with global model utility\. Mora et al\.Moraet al\.\([2024](https://arxiv.org/html/2606.24113#bib.bib78)\)propose FedUNRAN, utilizing local random label perturbations to disperse target gradients and attenuate their influence without server\-side intervention\. Wang et al\.Wanget al\.\([2023b](https://arxiv.org/html/2606.24113#bib.bib7)\)employ variational Bayesian inference for parameter self\-sharing to erase target data while preserving performance\. To accelerate unlearning, Liu et al\.Liuet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib8)\)approximate the Hessian via a diagonal empirical Fisher Information Matrix for quasi\-Newton optimization, while Zhu et al\.Zhuet al\.\([2023](https://arxiv.org/html/2606.24113#bib.bib9)\)combine inverse perturbation with passive decay, propagating updates via knowledge distillation\. Furthermore, Deng et al\.Denget al\.\([2024](https://arxiv.org/html/2606.24113#bib.bib79)\)introduce model contrastive unlearning \(MCU\) to regularize the feature space\. Zhong et al\.Zhonget al\.\([2025b](https://arxiv.org/html/2606.24113#bib.bib10)\)implement reversible unlearning through local fine\-tuning\.

In summary, existing unlearning approaches typically involve trade\-offs between unlearning precision and computational efficiency, and often suffer from limited generalization\. Moreover, reversibility is rarely considered in current studies\. Achieving a unified balance among unlearning precision, efficiency, and reversibility therefore remains an open challenge\.

## 3Methodology

### 3\.1Problem Description

In the FL framework, a set of clients denoted as𝒞=\{C1,C2,…,CN\}\\mathcal\{C\}=\\\{C\_\{1\},C\_\{2\},\\ldots,C\_\{N\}\\\}collaboratively train a global modelℳG\\mathcal\{M\}\_\{G\}without sharing local data\. Each clientCnC\_\{n\}trains a local modelℳn\\mathcal\{M\}\_\{n\}using its local dataset𝒟n\\mathcal\{D\}\_\{n\}and uploads the model parameters to a server\. The server aggregates these local models via weighted averaging based on dataset sizes to obtain a global modelℳG\\mathcal\{M\}\_\{G\}\. This process iterates overWWglobal rounds, where each global round compriseseelocal training epochs on the clients with a local learning ratelcl\_\{c\}\.ℳG\\mathcal\{M\}\_\{G\}is structurally decomposed into a feature extractorℳE\\mathcal\{M\}\_\{E\}and a classifierℳc​l\\mathcal\{M\}\_\{cl\}\. After aggregation,ℳG\\mathcal\{M\}\_\{G\}is distributed back to all clients, where the feature extractorℳE\\mathcal\{M\}\_\{E\}is leveraged to extract features from local data\. Additionally, a pluggable filterℳF​i\\mathcal\{M\}\_\{Fi\}is constructed to implement the filtering of knowledge that needs to be forgotten\. The overall dataset is denoted as𝒟=\{𝒟n\}n=1N\\mathcal\{D\}=\\\{\\mathcal\{D\}\_\{n\}\\\}\_\{n=1\}^\{N\}, where𝒟n\\mathcal\{D\}\_\{n\}represents the local dataset of clientCnC\_\{n\}\. The total data volume across the federation is‖𝒟‖=∑n=1N‖𝒟n‖\\\|\\mathcal\{D\}\\\|=\\sum\_\{n=1\}^\{N\}\\\|\\mathcal\{D\}\_\{n\}\\\|\. To facilitate data management, particularly for future unlearning operations, each local dataset𝒟n\\mathcal\{D\}\_\{n\}is partitioned into two disjoint subsets: a retained subset𝒟nR\\mathcal\{D\}\_\{n\}^\{R\}for model training, and an unlearning subset𝒟nU\\mathcal\{D\}\_\{n\}^\{U\}designated to be removed in compliance with privacy or regulatory constraints\. Letk∈\{1,2,…,K\}k\\in\\\{1,2,\\ldots,K\\\}index the data categories, whereKKis the total number of classes\.

In the FL phase, the training objective is defined as:

minθℳG⁡F​\(θℳG\)=∑n=1N\|𝒟n\|\|𝒟\|​∑\(xik,yik\)∈𝒟nℒ​\(f​\(xik;θℳG\),yik\),\\min\_\{\\theta\_\{\\mathcal\{M\}\_\{G\}\}\}F\(\\theta\_\{\\mathcal\{M\}\_\{G\}\}\)=\\sum\_\{n=1\}^\{N\}\\frac\{\|\\mathcal\{D\}\_\{n\}\|\}\{\|\\mathcal\{D\}\|\}\\sum\_\{\(x\_\{i\}^\{k\},y\_\{i\}^\{k\}\)\\in\\mathcal\{D\}\_\{n\}\}\\mathcal\{L\}\\bigl\(f\(x\_\{i\}^\{k\};\\theta\_\{\\mathcal\{M\}\_\{G\}\}\),y\_\{i\}^\{k\}\\bigr\),\(1\)whereℒ\\mathcal\{L\}denotes the loss function\. During the FU phase, the objective is to maximize the loss on the unlearning set, minimize the loss on the retained set, and keep the training cost low, described as:

minθℳG′⁡FR​\(θℳG′\)=∑n=1N\|𝒟nR\|\|𝒟R\|​∑\(xik,yik\)∈𝒟nRℒ​\(f​\(xik;θℳG′\),yik\),\\min\_\{\\theta\_\{\\mathcal\{M\}\_\{G\}^\{\\prime\}\}\}F^\{R\}\(\\theta\_\{\\mathcal\{M\}\_\{G\}^\{\\prime\}\}\)=\\sum\_\{n=1\}^\{N\}\\frac\{\|\\mathcal\{D\}\_\{n\}^\{R\}\|\}\{\|\\mathcal\{D\}^\{R\}\|\}\\sum\_\{\(x\_\{i\}^\{k\},y\_\{i\}^\{k\}\)\\in\\mathcal\{D\}\_\{n\}^\{R\}\}\\mathcal\{L\}\\bigl\(f\(x\_\{i\}^\{k\};\\theta\_\{\\mathcal\{M\}\_\{G\}^\{\\prime\}\}\),y\_\{i\}^\{k\}\\bigr\),\(2\)
maxθℳG′⁡FU​\(θℳG′\)=∑n=1N\|𝒟nU\|\|𝒟U\|​∑\(xik,yik\)∈𝒟nUℒ​\(f​\(xik;θℳG′\),yik\),\\max\_\{\\theta\_\{\\mathcal\{M\}\_\{G\}^\{\\prime\}\}\}F^\{U\}\(\\theta\_\{\\mathcal\{M\}\_\{G\}^\{\\prime\}\}\)=\\sum\_\{n=1\}^\{N\}\\frac\{\|\\mathcal\{D\}\_\{n\}^\{U\}\|\}\{\|\\mathcal\{D\}^\{U\}\|\}\\sum\_\{\(x\_\{i\}^\{k\},y\_\{i\}^\{k\}\)\\in\\mathcal\{D\}\_\{n\}^\{U\}\}\\mathcal\{L\}\\bigl\(f\(x\_\{i\}^\{k\};\\theta\_\{\\mathcal\{M\}\_\{G\}^\{\\prime\}\}\),y\_\{i\}^\{k\}\\bigr\),\(3\)
min⁡FT​\(θℳG′\)=∑w=1WComm\(w\)​\(ℳG,𝒞\),\\min F^\{T\}\(\\theta\_\{\\mathcal\{M\}\_\{G\}^\{\\prime\}\}\)=\\sum\_\{w=1\}^\{W\}\\text\{Comm\}^\{\(w\)\}\(\\mathcal\{M\}\_\{G\},\\mathcal\{C\}\\bigr\),\(4\)where Comm represents the communication resource between the server and the clients𝒞\\mathcal\{C\}during each round\.

### 3\.2Method Overview

As shown in Figure[3](https://arxiv.org/html/2606.24113#S3.F3), our method comprises four phases: federated learning, generation of class centroid samples, one\-shot federated unlearning, and inference, with each stage’s main procedures and key formulas shown in the diagram\.

![Refer to caption](https://arxiv.org/html/2606.24113v1/fig/framework_1.png)Figure 3:FedUP follows a structured workflow: it begins with federated learning to obtain a global model\. Upon request, clients produce differentially‑private class centroid samples from retained data and upload them for server aggregation\. The server then fine‑tunes a pluggable filter, blocking forgotten knowledge without modifying the base model\.#### 3\.2\.1Feature Extraction

Each clientCnC\_\{n\}performseerounds of training on its local dataset𝒟n\\mathcal\{D\}\_\{n\}, resulting in an updated local modelℳnw,e\\mathcal\{M\}\_\{n\}^\{w,e\}\. The local training process can be expressed as:

θℳnw,e=θℳnw,e−1−lc⋅∇θℒ​\(ℳnw,e−1,𝒟n\),\\theta\_\{\\mathcal\{M\}\_\{n\}\}^\{w,e\}=\\theta\_\{\\mathcal\{M\}\_\{n\}\}^\{w,e\-1\}\-l\_\{c\}\\cdot\\nabla\_\{\\theta\}\\mathcal\{L\}\(\\mathcal\{M\}\_\{n\}^\{w,e\-1\},\\mathcal\{D\}\_\{n\}\),\(5\)whereℒ\\mathcal\{L\}represents the local loss function and∇θℒ\\nabla\_\{\\theta\}\\mathcal\{L\}denotes its gradient with respect to the model parameters\.Subsequently, the server collects the local modelsℳnw,e\\mathcal\{M\}\_\{n\}^\{w,e\}from selected clientsCnC\_\{n\}and updates the global model through a weighted average:

θℳGw\+1=∑n=1N\|𝒟n\|\|𝒟\|​θℳnw,e\.\\theta\_\{\\mathcal\{M\}\_\{G\}\}^\{w\+1\}=\\sum\_\{n=1\}^\{N\}\\frac\{\|\\mathcal\{D\}\_\{n\}\|\}\{\|\\mathcal\{D\}\|\}\\theta\_\{\\mathcal\{M\}\_\{n\}\}^\{w,e\}\.\(6\)The trained global modelℳG\\mathcal\{M\}\_\{G\}is regarded as a combination of a feature extractorℳE\\mathcal\{M\}\_\{E\}and a classifierℳc​l\\mathcal\{M\}\_\{cl\}\. After freezing this composite model, it is deployed to the clients\. The features of all data𝒟n\\mathcal\{D\}\_\{n\}from each clientCnC\_\{n\}are extracted by the feature extractorℳE\\mathcal\{M\}\_\{E\}, which can be represented as follows:

Vnk=\{ℳE​\(xik\)∣xik∈𝒟nk\},V\_\{n\}^\{k\}=\\left\\\{\\mathcal\{M\}\_\{E\}\(x\_\{i\}^\{k\}\)\\mid x\_\{i\}^\{k\}\\in\\mathcal\{D\}\_\{n\}^\{k\}\\right\\\},\(7\)whereVnkV\_\{n\}^\{k\}denotes the feature embedding of the data belonging to classkkon clientCnC\_\{n\}\.

#### 3\.2\.2Generation of Class Centroid Samples

Upon receiving an unlearning request, clientCnC\_\{n\}removes the features associated with the unlearning data\. The remaining feature embeddings belonging to classkkare denoted as:

VnR,k=Vnk∖\{ℳE​\(xi\)∣xi∈𝒟nU,k\}\.V\_\{n\}^\{R,k\}=V\_\{n\}^\{k\}\\setminus\\left\\\{\\mathcal\{M\}\_\{E\}\(x\_\{i\}\)\\mid x\_\{i\}\\in\\mathcal\{D\}\_\{n\}^\{U,k\}\\right\\\}\.\(8\)
To reduce communication cost, the retained features are compressed via KMeans clustering with

Kn,k=⌈ρ​\|VnR,k\|⌉,K\_\{n,k\}=\\lceil\\rho\\,\|V\_\{n\}^\{R,k\}\|\\rceil,\(9\)the number of clusters is set proportionally according to the scenario byρ\\rho, yielding a compact set of centroids:

μnR,k=KMeans​\(VnR,k,Kn,k\)\.\\mu\_\{n\}^\{R,k\}=\\mathrm\{KMeans\}\(V\_\{n\}^\{R,k\},K\_\{n,k\}\)\.\(10\)
For privacy preservation, add noise to each centroid:

μ~nR,k=\{μ\+𝐳\|μ∈μnR,k,𝐳∼𝒩​\(0,σ2​I\)\}\.\\tilde\{\\mu\}\_\{n\}^\{R,k\}=\\left\\\{\\mu\+\\mathbf\{z\}\\;\\middle\|\\;\\mu\\in\\mu\_\{n\}^\{R,k\},\\;\\mathbf\{z\}\\sim\\mathcal\{N\}\(0,\\sigma^\{2\}I\)\\right\\\}\.\(11\)Here,𝒩​\(0,σ2​𝐈\)\\mathcal\{N\}\(0,\\sigma^\{2\}\\mathbf\{I\}\)represents a multivariate Gaussian distribution\. These mechanisms ensure that the release ofμ~nR,k\\tilde\{\\mu\}\_\{n\}^\{R,k\}does not reveal excessive information about any individual data point\. ClientCnC\_\{n\}then uploads all perturbed class centroidsμ~nR,k\\tilde\{\\mu\}\_\{n\}^\{R,k\}to the server\. Using the local class centroidsμ~nR,k\\tilde\{\\mu\}\_\{n\}^\{R,k\}, the server aggregates them to generate the global class centroidsμ~𝒢R,k\\tilde\{\\mu\}\_\{\\mathcal\{G\}\}^\{R,k\}\. The global class centroid is denoted as:

μ~𝒢R,k=\[μ~1R,k;μ~2R,k;…;μ~NR,k\]\.\\tilde\{\\mu\}\_\{\\mathcal\{G\}\}^\{R,k\}=\\left\[\\tilde\{\\mu\}\_\{1\}^\{R,k\};\\tilde\{\\mu\}\_\{2\}^\{R,k\};\\ldots;\\tilde\{\\mu\}\_\{N\}^\{R,k\}\\right\]\.\(12\)

#### 3\.2\.3One\-shot Federated Unlearning

The objective of the FU phase is to train the filterℳF​i\\mathcal\{M\}\_\{Fi\}such that it effectively blocks the flow of unlearning knowledge while allowing the retained learning knowledge to pass\. The specific steps and formulas are as follows\.

A filterℳF​i\\mathcal\{M\}\_\{Fi\}is inserted into the global modelℳG=ℳE⊕ℳc​l\\mathcal\{M\}\_\{G\}=\\mathcal\{M\}\_\{E\}\\oplus\\mathcal\{M\}\_\{cl\}, yielding a new model structure:

ℳG′=ℳE⊕ℳF​i⊕ℳc​l\.\\mathcal\{M\}\_\{G\}^\{\\prime\}=\\mathcal\{M\}\_\{E\}\\oplus\\mathcal\{M\}\_\{Fi\}\\oplus\\mathcal\{M\}\_\{cl\}\.\(13\)
Subsequently, the parameters of the global modelℳG\\mathcal\{M\}\_\{G\}including bothℳE\\mathcal\{M\}\_\{E\}andℳc​l\\mathcal\{M\}\_\{cl\}are frozen:

θℳ𝒢=θℳ𝒢∗\.\\theta\_\{\\mathcal\{M\_\{G\}\}\}=\\theta\_\{\\mathcal\{M\_\{G\}\}\}^\{\*\}\.\(14\)
The filterℳF​i\\mathcal\{M\}\_\{Fi\}is trained using the global class centroidsμ~𝒢R,k\\tilde\{\\mu\}\_\{\\mathcal\{G\}\}^\{R,k\}\. The filtered class centroid prediction is defined as:

\{yk^\}=\{ℳcl​\(ℳFi​\(μ~\)\)∣μ~∈μ~𝒢R,k,k∈𝒦R\}\.\\\{\\hat\{y\_\{k\}\}\\\}=\\\{\\mathcal\{M\}\_\{\\text\{cl\}\}\(\\mathcal\{M\}\_\{\\text\{Fi\}\}\(\\tilde\{\\mu\}\)\)\\mid\\tilde\{\\mu\}\\in\\tilde\{\\mu\}\_\{\\mathcal\{G\}\}^\{R,k\},k\\in\\mathcal\{K\}\_\{R\}\\\}\.\(15\)
We propose a composite loss function for the filter consisting of cross\-entropy and reconstruction losses\. It preserves discriminative capability on non\-target knowledge while enforcing structural consistency in the feature space, thereby improving the stability of the unlearning process without compromising knowledge selectivity\. The cross\-entropy loss, measuring the difference between predicted class probabilities and true labels, is defined as:

ℒCE=−∑kyk​log⁡\(y^k\),\\mathcal\{L\}\_\{\\text\{CE\}\}=\-\\sum\_\{k\}y\_\{k\}\\log\(\\hat\{y\}\_\{k\}\),\(16\)whereyky\_\{k\}represents the true label whiley^k\\hat\{y\}\_\{k\}denotes the corresponding predicted probability for classkk\. The reconstruction loss, which measures how well the filter reconstructs its input, is defined as:

ℒRE=1d​∑i=1d\|μ~−ℳF​i​\(μ~\)\|2,\\mathcal\{L\}\_\{\\text\{RE\}\}=\\frac\{1\}\{d\}\\sum\_\{i=1\}^\{d\}\{\|\\tilde\{\\mu\}\-\\mathcal\{M\}\_\{Fi\}\(\\tilde\{\\mu\}\)\|^\{2\}\},\(17\)whereddis the input and output dimensionality of the filter\. The total loss function used to train the filterℳF​i\\mathcal\{M\}\_\{Fi\}is a weighted sum of these two losses:

ℒtotal=α​ℒCE\+\(1−α\)​ℒRE,\\mathcal\{L\}\_\{\\text\{total\}\}=\\alpha\\mathcal\{L\}\_\{\\text\{CE\}\}\+\(1\-\\alpha\)\\mathcal\{L\}\_\{\\text\{RE\}\},\(18\)whereα\\alphais a weighting factor that balances the importance of the cross\-entropy loss and the reconstruction loss\. Train the filter forWaW\_\{a\}rounds with a learning ratelal\_\{a\}\. The update process is as follows:

θℳF​iw=θℳF​iw−1−la⋅∇θℒtotal​\(θℳF​iw−1;μ~𝒢R,k\)w=1,…,Wa\.\\theta\_\{\\mathcal\{M\}\_\{Fi\}\}^\{w\}=\\theta\_\{\\mathcal\{M\}\_\{Fi\}\}^\{w\-1\}\-\{l\_\{a\}\}\\cdot\\nabla\_\{\\theta\}\\mathcal\{L\}\_\{\\text\{total\}\}\\left\(\\theta\_\{\\mathcal\{M\}\_\{Fi\}\}^\{w\-1\};\\tilde\{\\mu\}\_\{\\mathcal\{G\}\}^\{R,k\}\\right\)\\quad w=1,\\dots,W\_\{a\}\.\(19\)
When a client requests the restoration of forgotten knowledge, the filterℳF​i\\mathcal\{M\}\_\{Fi\}can be removed, thereby reverting to the original global modelℳG\\mathcal\{M\}\_\{G\}:

ℳG=ℳE⊕ℳc​l\.\\mathcal\{M\}\_\{G\}=\\mathcal\{M\}\_\{E\}\\oplus\\mathcal\{M\}\_\{cl\}\.\(20\)

#### 3\.2\.4Differential Privacy Guarantee

Our framework injects Gaussian noise for privacy guarantee during client\-side centroid uploads, satisfying \(ε,δ\\varepsilon,\\delta\)\-DP\. The key is to calibrate the noise scaleσ\\sigmaaccording to the desired privacy budget and theℓ2\\ell\_\{2\}\-sensitivity, which is bounded by:

Δ2​fi=maxp,q⁡‖dp\(i\)−dq\(i\)‖2ni\.\\Delta\_\{2\}f\_\{i\}=\\max\_\{p,q\}\\frac\{\\\|d\_\{p\}^\{\(i\)\}\-d\_\{q\}^\{\(i\)\}\\\|\_\{2\}\}\{n\_\{i\}\}\.\(21\)
Table 1:Average privacy budget for different datasets\.We setσ≥2​ln⁡\(1\.25/δi\)⋅Δ2​fiεi\\sigma\\geq\\frac\{\\sqrt\{2\\ln\(1\.25/\\delta\_\{i\}\)\}\\cdot\\Delta\_\{2\}f\_\{i\}\}\{\\varepsilon\_\{i\}\}withδi=1/ni\\delta\_\{i\}=1/n\_\{i\}\. Empiricalεi\\varepsilon\_\{i\}values \(Table[1](https://arxiv.org/html/2606.24113#S3.T1)\) confirm moderate privacy budgetsε≈10\\varepsilon\\approx 10Weiet al\.\([2020](https://arxiv.org/html/2606.24113#bib.bib60)\)for all datasets under specifiedσ\\sigma\. We calculate our differentially private guarantee as:

εi=2​ln⁡\(1\.25​ni\)⋅Δ2​fiσ\.\\varepsilon\_\{i\}=\\frac\{\\sqrt\{2\\ln\(1\.25n\_\{i\}\)\}\\cdot\\Delta\_\{2\}f\_\{i\}\}\{\\sigma\}\.\(22\)The final choice of the optimalσ\\sigmais determined viaσ=q⋅s\\sigma=q\\cdot s, with the corresponding values ofqqandssestablished accordingly\. Detailed analysis and supporting experiments are relegated to Section 2\.1 and 2\.2 of the supplementary material\.

### 3\.3Algorithm

Input: Number of global rounds

WW, local rounds

ee, local learning rate

lcl\_\{c\}, adapter learning rate

lal\_\{a\}, clients

𝒞=\{C1,C2,…,CN\}\\mathcal\{C\}=\\\{C\_\{1\},C\_\{2\},\\ldots,C\_\{N\}\\\}, dataset

𝒟=\{𝒟n\}n=1N\\mathcal\{D\}=\\\{\\mathcal\{D\}\_\{n\}\\\}\_\{n=1\}^\{N\}
Output: Filter

ℳF​i\\mathcal\{M\}\_\{Fi\}
/ \* Feature extraction \* /

ℳG=ℳE⊕ℳc​l\\mathcal\{M\}\_\{G\}=\\mathcal\{M\}\_\{E\}\\oplus\\mathcal\{M\}\_\{cl\}

for*global roundw=1w=1toWW*do

Server sends

ℳGw\\mathcal\{M\}\_\{G\}^\{w\}to all clients

CnC\_\{n\}
for*each clientCnC\_\{n\}*do

CnC\_\{n\}performs

eelocal training rounds:

θℳnw,e=θℳnw−lc⋅∇θℒ​\(ℳn,𝒟nk\)\\theta\_\{\\mathcal\{M\}\_\{n\}\}^\{w,e\}=\\theta\_\{\\mathcal\{M\}\_\{n\}\}^\{w\}\-l\_\{c\}\\cdot\\nabla\_\{\\theta\}\\mathcal\{L\}\(\\mathcal\{M\}\_\{n\},\\mathcal\{D\}\_\{n\}^\{k\}\)
θℳGw\+1=∑n=1N\|𝒟n\|\|𝒟\|​θℳnw,e\\theta\_\{\\mathcal\{M\}\_\{G\}\}^\{w\+1\}=\\sum\_\{n=1\}^\{N\}\\frac\{\|\\mathcal\{D\}\_\{n\}\|\}\{\|\\mathcal\{D\}\|\}\\theta\_\{\\mathcal\{M\}\_\{n\}\}^\{w,e\}
/ \* Generation of class centroid samples \* /

for*each clientCnC\_\{n\}*do

Vnk=ℳE​\(xik\),xik∈𝒟nV\_\{n\}^\{k\}=\\mathcal\{M\}\_\{E\}\(x\_\{i\}^\{k\}\),x\_\{i\}^\{k\}\\in\\mathcal\{D\}\_\{n\}
V′nk=Vnk∖\{ℳE​\(xik\)\|xik∈𝒟nU\}\{V^\{\\prime\}\}\_\{n\}^\{k\}=V\_\{n\}^\{k\}\\setminus\\\{\\mathcal\{M\}\_\{E\}\(x\_\{i\}^\{k\}\)\|x\_\{i\}^\{k\}\\in\\mathcal\{D\}\_\{n\}^\{U\}\\\}
μnR,k=KMeans​\(VnR,k,Kn,k\)\\mu\_\{n\}^\{R,k\}=\\mathrm\{KMeans\}\(V\_\{n\}^\{R,k\},K\_\{n,k\}\)
Generate

μ~nR,k\\tilde\{\\mu\}\_\{n\}^\{R,k\}via Eq\. \([11](https://arxiv.org/html/2606.24113#S3.E11)\)

/ \* One\-shot federated unlearning \* /

ℳG′=ℳE⊕ℳF​i⊕ℳc​l\\mathcal\{M\}\_\{G\}^\{\\prime\}=\\mathcal\{M\}\_\{E\}\\oplus\\mathcal\{M\}\_\{Fi\}\\oplus\\mathcal\{M\}\_\{cl\}
Freeze parameters of

ℳ𝒢\\mathcal\{M\}\_\{\\mathcal\{G\}\}:

θℳ𝒢=θℳ𝒢∗\\theta\_\{\\mathcal\{M\}\_\{\\mathcal\{G\}\}\}=\\theta\_\{\\mathcal\{M\}\_\{\\mathcal\{G\}\}\}^\{\*\}
Train filter

ℳF​i\\mathcal\{M\}\_\{Fi\}using

μ~𝒢R,k\\tilde\{\\mu\}\_\{\\mathcal\{G\}\}^\{R,k\}via Eq\. \([18](https://arxiv.org/html/2606.24113#S3.E18)\) and Eq\. \([19](https://arxiv.org/html/2606.24113#S3.E19)\)

/ \* Restoration \* /

Remove filter:

ℳF​i\\mathcal\{M\}\_\{Fi\}:

ℳG=ℳE⊕ℳc​l\\mathcal\{M\}\_\{G\}=\\mathcal\{M\}\_\{E\}\\oplus\\mathcal\{M\}\_\{cl\}

Algorithm 1FedUPThe algorithm, as illustrated in Algorithm[1](https://arxiv.org/html/2606.24113#algorithm1), consists of three stages: feature extraction, generation of class centroid samplesμnR,k\\mu\_\{n\}^\{R,k\}and one\-shot federated unlearning\. Upon receiving an unlearning request, each client first removes the corresponding data points from its local feature set and computes class centroid samples using the retained data\. These centroids are then protected by a differential privacy mechanism and uploaded to the server\. The server aggregates the uploaded centroids from all clients to obtain global retained class centroids asμ~𝒢R,k\\tilde\{\\mu\}\_\{\\mathcal\{G\}\}^\{R,k\}\. Then the server freezes the parameters of the original global modelℳ𝒢\\mathcal\{M\}\_\{\\mathcal\{G\}\}and randomly initializes a lightweight, pluggable filterℳF​i\\mathcal\{M\}\_\{Fi\}, which is trained solely using the global differentially private centroids\. By jointly optimizing a cross\-entropy loss and a reconstruction lossℒtotal\\mathcal\{L\}\_\{\\text\{total\}\}, the filter blocks the propagation of forgotten knowledge while preserving the discriminative capability of retained knowledge\. Finally, when restoration of forgotten knowledge is required, the filter can be removed to revert to the original global model structure, enabling efficient and reversible federated unlearning\.

## 4Experiment

### 4\.1Experimental Setup

We conducted experiments on diverse datasets, including MNISTLeCunet al\.\([2002](https://arxiv.org/html/2606.24113#bib.bib68)\), CIFAR\-10, CIFAR\-100Krizhevskyet al\.\([2009](https://arxiv.org/html/2606.24113#bib.bib55)\), and AG NewsZhanget al\.\([2015](https://arxiv.org/html/2606.24113#bib.bib69)\)\. We partitioned the dataset using the Dirichlet distributionLiet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib80)\)with a concentration parameter of 0\.5\. These experiments use various network architectures, including LeNet\-5, ResNet\-18Heet al\.\([2016](https://arxiv.org/html/2606.24113#bib.bib56)\), ResNet\-34, TransformerVaswaniet al\.\([2017](https://arxiv.org/html/2606.24113#bib.bib57)\)and TinyBertJiaoet al\.\([2020](https://arxiv.org/html/2606.24113#bib.bib70)\), covering three unlearning scenarios: client unlearning, class unlearning and sample unlearning\. We evaluated five baselines: EraseClientHalimiet al\.\([2022](https://arxiv.org/html/2606.24113#bib.bib58)\), FederaserLiuet al\.\([2021](https://arxiv.org/html/2606.24113#bib.bib24)\), Exact\-FunXionget al\.\([2023](https://arxiv.org/html/2606.24113#bib.bib59)\), Retrain and FUSEDZhonget al\.\([2025b](https://arxiv.org/html/2606.24113#bib.bib10)\)\. The experimental framework is implemented using PyTorch 2\.3\.1 and CUDA 12\.1\. For hardware acceleration, an NVIDIA RTX 3080 Ti GPU is utilized\. We employ SGD and Adam optimizers\. The balancing factorα\\alphafor the loss of filters is set to 0\.5\. During the generation of sample centroids via clustering, we set different sampling ratiosρ\\rhoto accommodate various scenarios\. In the client, class, and sample scenarios, the sampling ratios are 0\.8, 0\.1, and 0\.5, respectively\.

Table 2:Evaluation metrics of FU\.Table 3:Main results\. Our method achieves the best or near\-best R‑A in most settings while maintaining low F‑A, and it performs particularly well in the class unlearning scenario, indicating that it maximizes model utility while ensuring effective unlearning\.Evaluation Metrics\. As shown in Table[2](https://arxiv.org/html/2606.24113#S4.T2), we evaluate our proposed method and the baselines using multiple metrics\.

### 4\.2Experimental Results

Main Results\. In the client unlearning setting, Byzantine attacks are introduced, where label flipping is applied to construct inverse feature prototypesFuet al\.\([2025a](https://arxiv.org/html/2606.24113#bib.bib85)\); Qiet al\.\([2023](https://arxiv.org/html/2606.24113#bib.bib86)\)\. Class unlearning randomly remaps the feature labels of the target class to several classes\. Sample unlearning is implemented via backdoor attacks: fixed triggers are injected into the bottom\-right pixels of images or appended as specific trigger tokens to the end of text sequences\. During the unlearning process, all triggered samples are consistently predicted as class 0, thereby yielding accuracy of class 0 \(0A\) on the forgotten samples\.

As shown in Table[3](https://arxiv.org/html/2606.24113#S4.T3), our method achieves superior performance across a wide range of datasets, model architectures, and unlearning scenarios\. Among them, CIFAR\-100 is the most challenging benchmark, while MNIST is the least\. Benefiting from the pluggable filters, FedUP reduces the accuracy on the unlearning knowledge to a minimal level \(F\-A\) while maintaining high accuracy on the retained data \(R\-A\)\. Owing to its centroid\-based unlearning mechanism, FedUP exhibits particularly strong performance in class unlearning scenarios\. Overall, the proposed method supports single\-round communication and reversible unlearning, achieving the desirable triad of high retention accuracy, low forgetting accuracy, and privacy guarantees across all evaluated datasets\.

Analysis of Non\-target Knowledge Loss\. To validate the effectiveness of our method in mitigating non\-target knowledge loss, we compare the accuracy on retained knowledge before and after executing the unlearning operation\. As shown in Figure[4](https://arxiv.org/html/2606.24113#S4.F4), FedUP achieves an effect comparable to the Retrain method, characterized by minimal non\-target knowledge loss and stable model performance before and after the unlearning operation\.

![Refer to caption](https://arxiv.org/html/2606.24113v1/x3.png)Figure 4:Non\-target knowledge loss of methods\.Unlearning Response Time\. As shown in Table[4](https://arxiv.org/html/2606.24113#S4.T4), when achieving the desired forgetting effect, our method requires only 6\.33 seconds for image unlearning and 6\.45 seconds for text unlearning, which is less than 10% of the latency of the second\-best method and less than 1% of that of the Retrain approach\. It can promptly respond to diverse unlearning requests across different modalities while maintaining consistently low latency, effectively minimizing processing delays\.

Communication Cost\. Communication cost is defined as the wireless resources consumed for transmitting model parameters or gradients between local clients and the central server\. As shown in Table[4](https://arxiv.org/html/2606.24113#S4.T4), although the centroid uploading phase introduces additional communication overhead, FedUP requires only a single communication round, meaning its communication cost does not accumulate with rounds\. In contrast, model transmission methods gradually converge as rounds increase, causing the communication overhead to escalate to the order of10310^\{3\}\. Specifically, the lightweight filter incurs an overhead of merely 11\.50MB for image unlearning, approximately one\-third of that of the second\-best method and 6\.73MB for text unlearning, which is second only to the FUSED method\.

Table 4:Comparing time and communication costs\.![Refer to caption](https://arxiv.org/html/2606.24113v1/x4.png)Figure 5:Ablation of loss functions\.
### 4\.3Analysis of Hyper\-parameters

Bottleneck Dimension\. In this section, we analyse the dimension of the filter\. The pluggable filter adopts a straightforward encoder\-decoder architecture\. Its input dimension is configured to match the output dimension of the feature extractor\. Our empirical investigation focuses on determining the optimal bottleneck dimension within this encoder\-decoder structure\. As shown in Table[5](https://arxiv.org/html/2606.24113#S4.T5), the remember accuracy consistently reaches its optimum across both image and text datasets when the bottleneck dimension is set to 32\.

Table 5:Exploration of filter\. “M” stands for memory \(KB\)\.Sensitivity Analysis of the Loss Function\.The filter is trained with joint reconstruction and cross\-entropy losses\. We performed a sensitivity analysis onα\\alphaofℒtotal\\mathcal\{L\}\_\{\\text\{total\}\}to determine its optimal value\. As shown in Figure[5](https://arxiv.org/html/2606.24113#S4.F5), using either loss alone hinders convergence and degrades the accuracy of retained knowledge, whereas their combination maximizes the reduction of non\-target knowledge loss\. This is further illustrated in Table[6](https://arxiv.org/html/2606.24113#S4.T6), which shows that increasing the weight of the cross\-entropy loss lowers the recognition accuracy on retained data, while assigning a higher weight to the reconstruction loss deteriorates the unlearning effectiveness\.

Table 6:Impact ofα\\alpha\-CE on R\-A and F\-A\. The filter shows optimal and well\-balenced performance atα=0\.5\\alpha=0\.5\.
### 4\.4Ablation Study

Analysis of Differential Privacy Effects\. To ensure privacy preservation, Gaussian noise is injected into the generated class centroids\. We conducted ablation studies to evaluate its impact, as shown in Table[7](https://arxiv.org/html/2606.24113#S4.T7)\. The results demonstrate that an appropriately calibrated noise scale does not significantly impede the model’s performance on non\-target knowledge\. Instead, it enhances model robustness, achieving a synergistic improvement in both privacy protection and model utility\.

Table 7:Differential privacy effects\.Table 8:Ablation of class centroid samples\.Effectiveness of Class Centroid Samples\. Class centroids are obtained by clustering original features and privately aggregated on the server side\. To validate the effectiveness of class centroid samples, we additionally conduct experiments in which the filter is trained using only the original features\. As illustrated in Table[8](https://arxiv.org/html/2606.24113#S4.T8), the model trained on class centroid samples achieves comparable performance on the accuracy of retained and unlearning knowledge to that trained on the original features, demonstrating that class centroid samples are as effective as the original features\.

## 5Conclusion and Discussion

Conclusion\. In the field of FU, server\-side methods face non\-target knowledge loss, whereas client‑side methods incur high request latency\. Both are limited by the irreversibility of the unlearning operation\. To address these issues, this paper proposes a one\-shot federated unlearning framework\. By employing differentially private class centroid samples on the server, our approach achieves approximate unlearning that surpasses exact unlearning in effect, reducing non‑target knowledge loss and high resource overhead\. Through fine\-tuning a lightweight plug\-in filter in a single round, the desired unlearning effect is achieved, significantly reducing the latency of unlearning responses\. Removing the filter allows the model to revert to its pre\-unlearning state, thus realizing the reversibility of unlearning\. Extensive experiments across diverse datasets, scenarios, and models demonstrate that FedUP demonstrates excellent performance\.

Discussion\. While FedUP advances unlearning efficiency, reversibility, and responsiveness, two fundamental limitations still persist: Class centroid fidelity depends on federated feature extraction\. Low\-quality global models propagate bias into aggregated class centroids\. It is expected that these limitations can be effectively addressed by utilizing more powerful pre\-trained backbones for local feature extraction to ensure reliable class centroids\.

#### 5\.0\.1Contribution Statement

Feihong Nan and Zhengyi Zhong contributed equally\.

## References

- L\. Bourtoule, V\. Chandrasekaran, C\. A\. Choquette\-Choo, H\. Jia, A\. Travers, B\. Zhang, D\. Lie, and N\. Papernot \(2021\)Machine unlearning\.In2021 IEEE symposium on security and privacy \(SP\),pp\. 141–159\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- Y\. Cao, A\. F\. Yu, A\. Aday, E\. Stahl, J\. Merwine, and J\. Yang \(2018\)Efficient repair of polluted machine learning systems via causal unlearning\.InASIACCS ’18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security,Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- M\. Chen, Z\. Zhang, T\. Wang, M\. Backes, M\. Humbert, and Y\. Zhang \(2022\)Graph unlearning\.InProceedings of the 2022 ACM SIGSAC conference on computer and communications security,pp\. 499–513\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- L\. de la Torre \(2018\)A guide to the california consumer privacy act of 2018\.SSRN Electronic Journal\(en\-US\)\.External Links:[Link](http://dx.doi.org/10.2139/ssrn.3275571),[Document](https://dx.doi.org/10.2139/ssrn.3275571)Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- Z\. Deng, L\. Luo, and H\. Chen \(2024\)Enable the right to be forgotten with federated client unlearning in medical imaging\.InInternational Conference on Medical Image Computing and Computer\-Assisted Intervention,pp\. 240–250\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- L\. Fu, S\. Huang, Y\. Lai, C\. Zhang, H\. Dai, Z\. Zheng, and C\. Chen \(2025a\)Federated domain\-independent prototype learning with alignments of representation and parameter spaces for feature shift\.IEEE Transactions on Mobile Computing\.Cited by:[§4\.2](https://arxiv.org/html/2606.24113#S4.SS2.p1.1)\.
- L\. Fu, S\. Huang, Y\. Li, C\. Chen, C\. Zhang, and Z\. Zheng \(2025b\)Learn the global prompt in the low\-rank tensor space for heterogeneous federated learning\.Neural Networks187,pp\. 107319\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- A\. Golatkar, A\. Achille, and S\. Soatto \(2020\)Eternal sunshine of the spotless net: selective forgetting in deep networks\.InProceedings of the IEEE/CVF conference on computer vision and pattern recognition,pp\. 9304–9312\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- L\. Graves, V\. Nagisetty, and V\. Ganesh \(2021\)Amnesiac machine learning\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.35,pp\. 11516–11524\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- C\. Guo, T\. Goldstein, A\. Hannun, and L\. Van Der Maaten \(2019\)Certified data removal from machine learning models\.arXiv preprint arXiv:1911\.03030\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- V\. Gupta, C\. Jung, S\. Neel, A\. Roth, S\. Sharifi\-Malvajerdi, and C\. Waites \(2021\)Adaptive machine unlearning\.Advances in Neural Information Processing Systems34,pp\. 16319–16330\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- A\. Halimi, S\. Kadhe, A\. Rawat, and N\. Baracaldo \(2022\)Federated unlearning: how to efficiently erase a client in fl?\.\(en\-US\)\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- K\. He, X\. Zhang, S\. Ren, and J\. Sun \(2016\)Deep residual learning for image recognition\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 770–778\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- T\. T\. Huynh, T\. B\. Nguyen, T\. T\. Nguyen, P\. L\. Nguyen, H\. Yin, Q\. V\. H\. Nguyen, and T\. T\. Nguyen \(2025\)Certified unlearning for federated recommendation\.ACM Transactions on Information Systems43\(2\),pp\. 1–29\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1),[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- W\. Jiang, K\. Liang, W\. Huang, X\. Zhang, Z\. Xu, G\. Wan, C\. Tan, F\. X\. Fan, and J\. Wang \(2026\)Unveiling and mitigating untargeted poisoning attacks on federated knowledge graph embedding\.InProceedings of the ACM Web Conference 2026,pp\. 2569–2580\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- X\. Jiao, Y\. Yin, L\. Shang, X\. Jiang, X\. Chen, L\. Li, F\. Wang, and Q\. Liu \(2020\)Tinybert: distilling bert for natural language understanding\.InFindings of the association for computational linguistics: EMNLP 2020,pp\. 4163–4174\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- A\. Krizhevsky, G\. Hinton,et al\.\(2009\)Learning multiple layers of features from tiny images\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- K\. Kuo, A\. Setlur, K\. Srinivas, A\. Raghunathan, and V\. Smith \(2025\)Exact unlearning of finetuning data via model merging at scale\.arXiv preprint arXiv:2504\.04626\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- M\. Kurmanji, P\. Triantafillou, J\. Hayes, and E\. Triantafillou \(2023\)Towards unbounded machine unlearning\.Advances in neural information processing systems36,pp\. 1957–1987\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- Y\. LeCun, L\. Bottou, Y\. Bengio, and P\. Haffner \(2002\)Gradient\-based learning applied to document recognition\.Proceedings of the IEEE86\(11\),pp\. 2278–2324\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- N\. Li, C\. Zhou, Y\. Gao, H\. Chen, Z\. Zhang, B\. Kuang, and A\. Fu \(2025\)Machine unlearning: taxonomy, metrics, applications, challenges, and prospects\.IEEE Transactions on Neural Networks and Learning Systems\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- Q\. Li, Y\. Diao, Q\. Chen, and B\. He \(2022\)Federated learning on non\-iid data silos: an experimental study\.In2022 IEEE 38th international conference on data engineering \(ICDE\),pp\. 965–978\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- G\. Liu, X\. Ma, Y\. Yang, C\. Wang, and J\. Liu \(2021\)Federaser: enabling efficient client\-level data removal from federated learning models\.In2021 IEEE/ACM 29th International Symposium on Quality of Service \(IWQOS\),pp\. 1–10\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p2.1),[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- Y\. Liu, L\. Xu, X\. Yuan, C\. Wang, and B\. Li \(2022\)The right to be forgotten in federated learning: an efficient realization with rapid retraining\.InIEEE INFOCOM 2022\-IEEE conference on computer communications,pp\. 1749–1758\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1),[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- Z\. Liu, Y\. Jiang, J\. Shen, M\. Peng, K\. Lam, X\. Yuan, and X\. Liu \(2024\)A survey on federated unlearning: challenges, methods, and future directions\.ACM Computing Surveys57\(1\),pp\. 1–38\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- Z\. Ma, Y\. Liu, X\. Liu, J\. Liu, J\. Ma, and K\. Ren \(2022\)Learn to forget: machine unlearning via neuron masking\.IEEE Transactions on Dependable and Secure Computing20\(4\),pp\. 3194–3207\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- H\. McMahan, E\. Moore, D\. Ramage, S\. Hampson, and B\. Arcas \(2016\)Communication\-efficient learning of deep networks from decentralized data\.arXiv: Learning,arXiv: Learning\(en\-US\)\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- A\. Mora, L\. Dominici, and P\. Bellavista \(2024\)Fedunran: on\-device federated unlearning via random labels\.In2024 IEEE International Conference on Big Data \(BigData\),pp\. 7955–7960\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- T\. T\. Nguyen, T\. T\. Huynh, Z\. Ren, P\. L\. Nguyen, A\. W\. Liew, H\. Yin, and Q\. V\. H\. Nguyen \(2025\)A survey of machine unlearning\.ACM Transactions on Intelligent Systems and Technology16\(5\),pp\. 1–46\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- Z\. Pan, Z\. Wang, C\. Li, K\. Zheng, B\. Wang, X\. Tang, and J\. Zhao \(2025\)Federated unlearning with gradient descent and conflict mitigation\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 19804–19812\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1),[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- Z\. Qi, L\. Meng, Z\. Chen, H\. Hu, H\. Lin, and X\. Meng \(2023\)Cross\-silo prototypical calibration for federated learning with non\-iid data\.InProceedings of the 31st ACM international conference on multimedia,pp\. 3099–3107\.Cited by:[§4\.2](https://arxiv.org/html/2606.24113#S4.SS2.p1.1)\.
- Z\. Qi, L\. Meng, W\. He, R\. Zhang, Y\. Wang, X\. Qi, and X\. Meng \(2024\)Cross\-training with multi\-view knowledge fusion for heterogenous federated learning\.arXiv e\-prints,pp\. arXiv–2405\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- N\. Truong, K\. Sun, S\. Wang, F\. Guitton, and Y\. Guo \(2021\)Privacy preservation in federated learning: an insightful survey from the gdpr perspective\.Computers & Security110,pp\. 102402\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.Advances in neural information processing systems30\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- C\. Wang, M\. Huai, and D\. Wang \(2023a\)Inductive graph unlearning\.In32nd USENIX Security Symposium \(USENIX Security 23\),pp\. 3205–3222\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p1.1)\.
- J\. Wang, S\. Guo, X\. Xie, and H\. Qi \(2022\)Federated unlearning via class\-discriminative pruning\.InProceedings of the ACM web conference 2022,pp\. 622–632\.Cited by:[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- W\. Wang, Z\. Tian, C\. Zhang, A\. Liu, and S\. Yu \(2023b\)Bfu: bayesian federated unlearning with parameter self\-sharing\.InProceedings of the 2023 ACM Asia Conference on Computer and Communications Security,pp\. 567–578\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1),[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- K\. Wei, J\. Li, M\. Ding, C\. Ma, H\. H\. Yang, F\. Farokhi, S\. Jin, T\. Q\. Quek, and H\. V\. Poor \(2020\)Federated learning with differential privacy: algorithms and performance analysis\.IEEE transactions on information forensics and security15,pp\. 3454–3469\.Cited by:[§3\.2\.4](https://arxiv.org/html/2606.24113#S3.SS2.SSS4.p2.5)\.
- C\. Wu, S\. Zhu, and P\. Mitra \(2022\)Federated unlearning with knowledge distillation\.arXiv preprint arXiv:2201\.09441\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1),[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.
- Z\. Xiong, W\. Li, Y\. Li, and Z\. Cai \(2023\)Exact\-fun: an exact and efficient federated unlearning approach\.In2023 IEEE International Conference on Data Mining \(ICDM\),pp\. 1439–1444\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- Z\. Yang, J\. Han, C\. Wang, and H\. Liu \(2025\)Erase then rectify: a training\-free parameter editing approach for cost\-effective graph unlearning\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 13044–13051\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- X\. Zhang, J\. Zhao, and Y\. LeCun \(2015\)Character\-level convolutional networks for text classification\.Advances in neural information processing systems28\.Cited by:[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- Z\. Zhong, W\. Bao, J\. Wang, J\. Chen, L\. Lyu, and W\. Y\. B\. Lim \(2025a\)Sacfl: self\-adaptive federated continual learning for resource\-constrained end devices\.IEEE Transactions on Neural Networks and Learning Systems\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- Z\. Zhong, W\. Bao, J\. Wang, S\. Zhang, J\. Zhou, L\. Lyu, and W\. Y\. B\. Lim \(2025b\)Unlearning through knowledge overwriting: reversible federated unlearning via selective sparse adapter\.InProceedings of the Computer Vision and Pattern Recognition Conference,pp\. 30661–30670\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1),[§2](https://arxiv.org/html/2606.24113#S2.p2.1),[§4\.1](https://arxiv.org/html/2606.24113#S4.SS1.p1.2)\.
- Z\. Zhong, W\. Bao, J\. Wang, X\. Zhu, and X\. Zhang \(2022\)Flee: a hierarchical federated learning framework for distributed deep neural network over cloud, edge, and end device\.ACM Transactions on Intelligent Systems and Technology \(TIST\)13\(5\),pp\. 1–24\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1)\.
- X\. Zhu, G\. Li, and W\. Hu \(2023\)Heterogeneous federated knowledge graph embedding learning and unlearning\.InProceedings of the ACM web conference 2023,pp\. 2444–2454\.Cited by:[§1](https://arxiv.org/html/2606.24113#S1.p1.1),[§2](https://arxiv.org/html/2606.24113#S2.p2.1)\.

Similar Articles

Accurate and Resource-Efficient Federated Continual Learning

arXiv cs.LG

FedRAN is a resource-aware analytic federated continual learning framework that replaces gradient-based updates with compact random feature statistics, achieving high accuracy with significantly lower communication and computation costs.

Federated Learning

ML at Berkeley

The article explains the concept of Federated Learning as a privacy-preserving machine learning technique that trains models on local devices rather than central servers. It details the process of encrypted parameter updates and aggregation to mitigate data leakage risks while maintaining model performance.