Power to the Clients: Federated Learning in a Dictatorship Setting

arXiv cs.CL Papers

Summary

This paper introduces 'dictator clients'—a novel class of malicious participants in federated learning capable of erasing other clients' contributions while preserving their own—and provides theoretical analysis of their impact on model convergence, including scenarios with multiple adversarial clients.

arXiv:2510.22149v3 Announce Type: replace-cross Abstract: Federated learning (FL) has emerged as a promising paradigm for decentralized model training, enabling multiple clients to collaboratively learn a shared model without exchanging their local data. However, the decentralized nature of FL also introduces vulnerabilities, as malicious clients can compromise or manipulate the training process. In this work, we introduce dictator clients, a novel, well-defined, and analytically tractable class of malicious participants capable of entirely erasing the contributions of all other clients from the server model, while preserving their own. We propose concrete attack strategies that empower such clients and systematically analyze their effects on the learning process. Furthermore, we explore complex scenarios involving multiple dictator clients, including cases where they collaborate, act independently, or form an alliance in order to ultimately betray one another. For each of these settings, we provide a theoretical analysis of their impact on the global model's convergence. Our theoretical algorithms and findings about the complex scenarios including multiple dictator clients are further supported by empirical evaluations on both computer vision and natural language processing benchmarks.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:33 AM

# Power to the Clients: Federated Learning in a Dictatorship Setting Source: https://arxiv.org/html/2510.22149 Mohammadsajad Alipour, Mohammad Mohammadi Amiri ###### Abstract Federated learning \(FL\) has emerged as a promising paradigm for decentralized model training, enabling multiple clients to collaboratively learn a shared model without exchanging their local data. However, the decentralized nature of FL also introduces vulnerabilities, as malicious clients can compromise or manipulate the training process. In this work, we introduce dictator clients, a novel, well-defined, and analytically tractable class of malicious participants capable of entirely erasing the contributions of all other clients from the server model, while preserving their own. We propose concrete attack strategies that empower such clients and systematically analyze their effects on the learning process. Furthermore, we explore complex scenarios involving multiple dictator clients, including cases where they collaborate, act independently, or form an alliance in order to ultimately betray one another. For each of these settings, we provide a theoretical analysis of their impact on the global model’s convergence. Our theoretical algorithms and findings about the complex scenarios including multiple dictator clients are further supported by empirical evaluations on both computer vision and natural language processing benchmarks. ## IIntroduction Federated learning \(FL\)[16 (https://arxiv.org/html/2510.22149#bib.bib10)] is a distributed learning paradigm in which model training is performed collaboratively by a set of clients. In centralized FL, a global server broadcasts the current model to all clients, each of which updates the model using its local dataset and sends back the resulting gradients to the server. The server then aggregates these gradients to update the global model. This approach accelerates training by distributing computation across multiple machines, while also enhancing data privacy since clients share only gradients, not their raw data. FL is especially well-suited for privacy-sensitive applications, such as training on confidential medical records across hospitals. Despite its advantages, FL remains vulnerable to malicious behavior by the participating clients. Byzantine clients are adversarial participants that disrupt the training process by sending arbitrary or manipulated updates to the central server[8 (https://arxiv.org/html/2510.22149#bib.bib1)][3 (https://arxiv.org/html/2510.22149#bib.bib4)]. The presence of such adversaries can significantly degrade model performance, making Byzantine robustness a critical area of study[11 (https://arxiv.org/html/2510.22149#bib.bib2),24 (https://arxiv.org/html/2510.22149#bib.bib5),22 (https://arxiv.org/html/2510.22149#bib.bib6),5 (https://arxiv.org/html/2510.22149#bib.bib7),27 (https://arxiv.org/html/2510.22149#bib.bib8),28 (https://arxiv.org/html/2510.22149#bib.bib9)]. Moreover, several studies have demonstrated the possibility of backdoor attacks in FL via collusion attacks where multiple malicious clients coordinate their actions to inject hidden triggers into the global model in FL[13 (https://arxiv.org/html/2510.22149#bib.bib11),19 (https://arxiv.org/html/2510.22149#bib.bib12),25 (https://arxiv.org/html/2510.22149#bib.bib13),1 (https://arxiv.org/html/2510.22149#bib.bib3)]. These clients may exchange information and strategically craft updates that steer the aggregated model toward a compromised state. However, the majority of existing literature primarily focuses on defending against Byzantine clients, while comparatively little attention has been given to characterizing specific and well-defined behaviors of Byzantine clients that have a different specific goal—especially in exploring diverse scenarios involving their presence within the system. In FL, a malicious client may aim to impose the statistical properties or specific patterns of its own dataset onto the global model. Such a client effectively attempts to dictate the final model by aligning it more closely with its local data distribution. This behavior may serve various objectives, such as improving performance on a target task, biasing global model’s decisions toward a desired objective, embedding backdoors, or degrading the model’s generalization on other clients’ data. By exploiting vulnerabilities in the model aggregation process, especially when contributions are blindly averaged or insufficiently audited, a malicious client can steer the training dynamics to serve its own objectives, ultimately dominating the global model’s behavior. In this work, we introduce a novel and formally defined class of Byzantine clients in FL, characterized by precise assumptions about their knowledge of the system and limitations. In contrast to prior studies, which often assumed omniscient or overly powerful adversaries, we consider malicious clients with only minimal communication capabilities among themselves. These clients lack visibility into the internal structure of the global model and have no information about the data or updates of benign clients. By clearly bounding their capabilities, our framework offers a more realistic and fine-grained understanding of adversarial behavior in practical FL environments. The goal of these malicious clients is to preserve their own influence on the final global model while entirely eliminating the contributions of all other participants—as if the benign clients had never been involved in the training process. We refer to such independent malicious clients as dictator clients due to their unilateral domination of the model. When multiple such clients coordinate via their limited communication link to jointly dominate training, we refer to them as collaborative dictator clients. We show that these clients do not require any privileged access to the server or any external metadata—making their attack strategies particularly concerning from a security perspective. To demonstrate the feasibility of this threat, we develop a series of algorithms that enable malicious clients to achieve their goals within the defined constraints. Our theoretical findings are further supported by empirical results, which validate the effectiveness of the proposed attack strategies. Beyond isolated attacks, we also investigate complex and previously underexamined dynamics that arise among malicious clients themselves. For example, we examine scenarios in which all participants in the system act as dictators, as well as cases where collaborative dictator clients can betray one another within their own partnership. These scenarios reveal internal conflicts among adversaries and broaden the understanding of multi-agent adversarial behavior in FL. The practical implications of dictator clients are also discussed in more detail in AppendixE (https://arxiv.org/html/2510.22149#A5). ## IIRelated Work The distributed nature of FL, combined with the server’s limited visibility into local training processes, makes it vulnerable to various security threats posed by malicious or compromised clients[31 (https://arxiv.org/html/2510.22149#bib.bib17)]. In this section, we review existing literature across three major category of attacks—Byzantine attacks, backdoor attacks, and collusion attacks. ### Byzantine Attacks Byzantine attacks pose a fundamental threat in distributed systems including FL, where a subset of clients, known as Byzantine clients, arbitrarily deviate from the prescribed protocol by submitting malicious or anomalous updates to the central server[8 (https://arxiv.org/html/2510.22149#bib.bib1)]. The goals of such attacks typically include degrading the global model’s performance or preventing convergence[3 (https://arxiv.org/html/2510.22149#bib.bib4)]. Attack strategies vary in complexity, ranging from simple approaches such as random noise injection or submitting zero gradients to more sophisticated methods like sign-flipping[20 (https://arxiv.org/html/2510.22149#bib.bib18),23 (https://arxiv.org/html/2510.22149#bib.bib19)]. Advanced attacks are often crafted to evade specific defenses, making them challenging to detect and mitigate[22 (https://arxiv.org/html/2510.22149#bib.bib6),2 (https://arxiv.org/html/2510.22149#bib.bib20)]. ### Backdoor Attacks Backdoor attacks (also known as Trojan attacks) are a more insidious threat in FL where attackers aim to embed hidden malicious behavior into the global model[4 (https://arxiv.org/html/2510.22149#bib.bib21),12 (https://arxiv.org/html/2510.22149#bib.bib22)]. An attacker, typically controlling one or more clients, manipulates their local dataset or model updates to create a ”backdoor trigger”—a specific pattern or feature (e.g., a small patch in an image, a specific phrase in text). The compromised global model performs normally on clean inputs but exhibits attacker-chosen behavior, such as misclassification, when the trigger is present. These attacks can be implemented through various strategies, including data poisoning, where labels are manipulated for samples containing the trigger, and model poisoning, where malicious updates are directly crafted to influence model behavior[1 (https://arxiv.org/html/2510.22149#bib.bib3),26 (https://arxiv.org/html/2510.22149#bib.bib23)]. Triggers may be static and predefined[1 (https://arxiv.org/html/2510.22149#bib.bib3)] or dynamically generated using optimization techniques to make them more subtle and difficult to detect[29 (https://arxiv.org/html/2510.22149#bib.bib24)]. Comprehensive surveys on backdoor attacks and defenses in FL can be found in[17 (https://arxiv.org/html/2510.22149#bib.bib25)]. ### Collusion Attacks Collusion attacks occur when multiple malicious clients coordinate their actions to enhance the effectiveness of the attacks or bypass defenses designed for independent attackers. Colluding attackers can amplify the impact of Byzantine or backdoor attacks. For example, multiple Byzantine clients might coordinate their updates to overwhelm Byzantine-resilient aggregation rules that assume the number of attackers are limited[26 (https://arxiv.org/html/2510.22149#bib.bib23)]. Similarly, colluding clients can implement distributed backdoor attacks, where each attacker contributes a part of the malicious update, making individual contributions appear benign while collectively embedding a backdoor into the global model[15 (https://arxiv.org/html/2510.22149#bib.bib26)]. More advanced and specific collusion strategies include alternating attacks and stealthy collusion. In alternating (on-off) attacks, malicious clients alternate between benign and malicious behavior to build reputation or evade history-based detection[10 (https://arxiv.org/html/2510.22149#bib.bib28)]. In stealthy collusion attacks, attackers coordinate to make their cumulative malicious impact significant while keeping individual updates close to benign ones to evade detection[14 (https://arxiv.org/html/2510.22149#bib.bib27)]. Such attacks aim for sparsity and stealthiness. While prior research has primarily focused on degrading model utility or embedding backdoors, our work introduces and formalizes a new adversarial paradigm: dictator clients—malicious participants whose goal is not to harm performance but to fully preserve their own contribution to the global model while completely erasing the influence of other clients. Unlike traditional Byzantine or backdoor attacks, dictator clients aim to bias the learning outcome toward their local objectives without necessarily compromising overall model accuracy. Moreover, we investigate nuanced interaction dynamics among multiple dictator clients, including collaboration, conflict, and strategic deception. To the best of our knowledge, this is the first systematic exploration of such influence-preserving and interaction-aware attacks, revealing a novel and underexplored threat model in FL. ## IIIProblem Formulation and Preliminaries We consider a centralized FL setting in which, during each communication round, a central server broadcasts the current model weights to all clients. Each client then performs stochastic gradient descent on the loss function on its local dataset to compute an update. These local updates are sent back to the server, which aggregates them—most commonly through simple averaging—and applies a global gradient descent step scaled by a predefined learning rate. To enable a more precise formulation and analysis of the attacks, we assume that the server aggregates updates from all clients in every round—an assumption that commonly holds in cross-silo FL settings[6 (https://arxiv.org/html/2510.22149#bib.bib15)]. We defer to future work the exploration of FL variants that either allow partial client participation or permit clients to perform several local updates before aggregation. Let θ_t denote the global model weights maintained by the server at iteration t, and let N = {1,2,...,N} represent the set of N participating clients. For each client n ∈ N, let ∇L_n(θ_t) denote the gradient of its local loss function with respect to the current model θ_t. The server updates the global model at each round after collecting the gradients from all clients as follows: θ_{t+1} = θ_t - η ∑_{n=1}^N ∇L_n(θ_t), (1) where η > 0 denotes the server-side learning rate. The global model is initialized as θ_0 at the server and distributed to all clients at the beginning of training. We further define a hypothetical baseline scenario where only a single client m ∈ N participates in the learning process. Let θ̂_t^m denote the model weights at iteration t in this single-client scenario. The corresponding update rule simplifies to θ̂_{t+1}^m = θ̂_t^m - η ∇L_m(θ̂_t^m), with initialization θ̂_0^m = θ_0. We further generalize this formulation to a subset of clients. Let P ⊂ N denote a subset of P clients, where 1 < P < N. Then the model trained exclusively on the clients in P evolves according to: θ_{t+1}^P = θ_t^P - η ∑_{p∈P} ∇L_p(θ_t^P), with initial value θ_0^P = θ_0. ## IVDictator Clients: Definition and Attack Strategy In this section, we formally define the concept of a dictator client and present the precise attack strategy through which such a client achieves its objective. We begin by establishing the conditions under which a malicious participant can be considered a dictator in a federated learning system. A dictator client is a participant whose objective is to ensure that the final global model θ_T, after T communication rounds, is as close as possible to the model that would have resulted from training exclusively on its own local data, i.e., θ̂_T^m, while simultaneously erasing the contributions of all other clients. More formally, a client m is a dictator if it aims to achieve: θ_T ≈ θ̂_T^m, regardless of the data and updates from other clients. To accomplish this, the dictator client must manipulate its local updates in such a way that the server’s aggregation process effectively nullifies the contributions of benign clients and replaces them with its own. We assume the dictator client has no knowledge of the benign clients’ data or updates, and can only communicate with other dictator clients (if any) over a limited, pre-established channel. No access to the server’s internal state or aggregation rule is assumed beyond the standard broadcast of global model weights. We propose a specific attack algorithm (Algorithm 1) that enables a dictator client to achieve this goal. The core idea is to send a scaled-up version of its own local gradient, designed to dominate the average. Specifically, in each round t, after receiving the global model θ_t from the server, the dictator client m computes its local gradient ∇L_m(θ_t) as usual. It then crafts a malicious update g_t^m = (N - 1) ∇L_m(θ_t) and sends this to the server. The server, following standard aggregation, averages all received updates. The benign clients n ≠ m send their true local gradients ∇L_n(θ_t). The server’s aggregated gradient becomes: (1/N) [ g_t^m + ∑_{n≠m} ∇L_n(θ_t) ] = (1/N) [ (N-1) ∇L_m(θ_t) + ∑_{n≠m} ∇L_n(θ_t) ] = ∇L_m(θ_t) + (1/N) ∑_{n≠m} ( ∇L_n(θ_t) - ∇L_m(θ_t) ). The second term is a perturbation. However, if the benign gradients are similar to the dictator’s gradient, the perturbation is small. In the ideal case where all benign clients have the same data distribution as the dictator, the term vanishes. In general, the dictator’s scaled update ensures that its gradient dominates the average, shifting the global model toward its local optimum. ## VAnalysis of Single Dictator Client We now analyze the effect of a single dictator client on the global model’s convergence. We consider the update rule under the dictator’s attack and compare it to the baseline where only the dictator participates. Let the global model at round t be θ_t. The server receives updates from all N clients and computes the aggregated gradient as above. The global model update is: θ_{t+1} = θ_t - η [ ∇L_m(θ_t) + (1/N) ∑_{n≠m} ( ∇L_n(θ_t) - ∇L_m(θ_t) ) ]. (2) If we denote the error term as ε_t = (1/N) ∑_{n≠m} ( ∇L_n(θ_t) - ∇L_m(θ_t) ), then the update is θ_{t+1} = θ_t - η ∇L_m(θ_t) - η ε_t. This is a perturbed version of the single-client update (θ̂_{t+1}^m = θ̂_t^m - η ∇L_m(θ̂_t^m)). Under appropriate assumptions on the Lipschitz continuity of the gradients and bounded variance, we can bound the distance between θ_t and θ̂_t^m. Specifically, if the benign clients’ gradients are close to the dictator’s gradient, the perturbation is small and the dictator effectively steers the model. We provide a formal theorem (Theorem 1) stating that under certain conditions, the global model converges to a neighborhood of the dictator’s optimal model. The proof relies on standard convex optimization arguments and is deferred to the appendix. ## VIMultiple Dictator Clients: Collaborative and Competitive Scenarios We now extend our analysis to settings involving multiple dictator clients. We consider three distinct scenarios: (1) collaborative dictator clients that work together, (2) independent dictator clients that act without coordination, and (3) alliances where dictators form a coalition but may later betray each other. ### Collaborative Dictator Clients When multiple dictator clients collaborate, they share a common goal of jointly dominating the model. They coordinate their updates to maximize their collective influence. Suppose a set C of C dictator clients collaborate. Each collaborator sends a scaled update: g_t^i = ( (N - C) / C ) ∇L_i(θ_t) for i ∈ C, where the scaling ensures that the total contribution from all collaborators matches the sum of benign clients’ contributions. The benign clients send their true gradients. The server’s average gradient becomes: (1/N) [ ∑_{i∈C} g_t^i + ∑_{n∉C} ∇L_n(θ_t) ] = (1/N) [ (N - C) ∇L_{avg}(θ_t) + ∑_{n∉C} ∇L_n(θ_t) ], where ∇L_{avg}(θ_t) is the average gradient of the collaborators. This effectively forces the global update to follow the direction of the collaborators’ average gradient, while the benign clients’ gradients are suppressed. The collaborators can steer the model toward a compromise that benefits all of them, potentially at the expense of benign participants. ### Independent Dictator Clients When dictator clients act independently, each tries to erase the contributions of all others, including other dictators. This creates a conflict. Suppose there are K dictator clients, each following the single-dictator strategy. Each dictator sends g_t^i = (N - 1) ∇L_i(θ_t) for i ∈ D (set of dictators). The benign clients send their true gradients. The aggregated gradient is: (1/N) [ ∑_{i∈D} (N-1) ∇L_i(θ_t) + ∑_{n∉D} ∇L_n(θ_t) ] = (1/N) [ (N-1) ∑_{i∈D} ∇L_i(θ_t) + benign sum ]. If K > 1, the dictators’ scaled gradients may conflict. For example, if two dictators have opposing gradient directions, their scaled contributions can cancel or produce a gradient that does not align with either’s intent. The result is that the global model may not converge to any dictator’s desired model. In the extreme case where all clients are dictators (K = N), the update becomes: (1/N) [ (N-1) ∑_{i=1}^N ∇L_i(θ_t) ] = (N-1)/N ∑_{i=1}^N ∇L_i(θ_t). This is essentially a scaled version of the average of all gradients. However, if each dictator wants to impose its own gradient, the net effect is that the update moves in the direction of the average gradient, but with a scaling factor. But note that each dictator is trying to nullify others, so the average gradient is not what any individual wants. Theoretical analysis shows that when all clients are independent dictators, the learning process degenerates. Specifically, from Eq. (9) in the appendix, if we assume N - 2 > 0 (i.e., more than 2 clients), and η > 0, then η(N-2) > 0. Consequently, from Eq. (9), it follows that when all clients act as independent dictators and send the defined malicious update, the resulting model update effectively moves in the opposite direction of the intended gradient. In other words, the updating procedure resembles gradient ascent instead of gradient descent, thereby increasing the loss rather than minimizing it. This behavior causes the model to “unlearn” the progress made in the previous iteration. Therefore, when every client behaves as an independent dictator, the global model fails to learn meaningful representations and makes no effective progress. Our empirical results, presented in Section F-A, confirm this breakdown in learning in practice. ### Alliances and Betrayal In more complex scenarios, a group of dictator clients may form an alliance to collaboratively dominate the model, but later one or more members may betray the others by deviating from the agreed strategy to gain individual advantage. For example, an ally might send a larger update than agreed upon to increase its own influence, or switch to an independent strategy. This creates a dynamic where trust and deception play a role. We analyze the stability of such alliances and show that without trust mechanisms, alliances are inherently unstable because each member has an incentive to betray to achieve full dominance. However, if betrayal is detectable and punishable (e.g., through reputation), alliances can be sustained. ## VIIEmpirical Evaluation We evaluate our proposed attack strategies on both computer vision (e.g., CIFAR-10) and natural language processing (e.g., sentiment analysis on text) benchmarks. We compare the global model’s performance under different attack scenarios: no attack, single dictator, collaborative dictators, and all-independent dictators. Results show that a single dictator client can effectively steer the model toward its local data distribution, achieving high accuracy on its own test set while degrading performance on benign clients’ data. Collaborative dictators can jointly influence the model to balance their objectives. In the all-independent dictators scenario, the model fails to converge and accuracy remains near random. Detailed experimental setup and results are provided in the appendix. ## VIIIConclusion and Future Work We have introduced the concept of dictator clients in federated learning, a novel class of Byzantine adversaries whose goal is to preserve their own influence while erasing the contributions of others. We formalized their behavior, proposed concrete attack strategies, and analyzed various multi-dictator scenarios. Our theoretical and empirical results demonstrate the effectiveness and potential destructiveness of such attacks. Future work includes exploring defenses against dictator attacks, extending to partial client participation and local updates, and investigating the economic and game-theoretic aspects of dictator behaviors. We also plan to study the implications for fairness and privacy in FL. ## References [1] ... (references as in original) ## Appendix A: Proof of Theorem 1 ... ## Appendix B: Additional Experimental Details ... ## Appendix C: Analysis of Collaborative Dictator Convergence ... ## Appendix D: Game-Theoretic Model of Alliance Betrayal ... ## Appendix E: Practical Implications ... (details as in original) ## Appendix F: Mutual Domination Analysis ### F-A Experiments for Mutual Domination We now consider the extreme scenario where every client behaves as an independent dictator, each executing Algorithm 1 to preserve only its own contribution while nullifying the effects of all others. As established theoretically in Section VI, when every client is an independent dictator, the global model update effectively performs gradient ascent, causing the model to unlearn and fail to converge. Our experiments confirm this: across both vision and text tasks, the accuracy of the global model remains at chance level after many rounds, while the loss increases or oscillates. This demonstrates that mutual domination leads to a complete breakdown of learning.

Similar Articles

Federated Learning

ML at Berkeley

The article explains the concept of Federated Learning as a privacy-preserving machine learning technique that trains models on local devices rather than central servers. It details the process of encrypted parameter updates and aggregation to mitigate data leakage risks while maintaining model performance.

Information Theoretic Adversarial Training of Large Language Models

arXiv cs.LG

This paper introduces WARDEN, a distributionally robust adversarial training framework for large language models that uses f-divergence to dynamically reweight adversarial examples, significantly reducing attack success rates while maintaining computational efficiency.

Adversarial attacks on neural network policies

OpenAI Blog

OpenAI researchers demonstrate that adversarial attacks, previously studied in computer vision, are also effective against neural network policies in reinforcement learning, showing significant performance degradation even with small imperceptible perturbations in white-box and black-box settings.