A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions

arXiv cs.LG 05/13/26, 04:00 AM Papers
Summary
This paper presents a comprehensive experimental comparison of various federated learning aggregation strategies, analyzing their performance and efficiency under both homogeneous and heterogeneous data distributions.
arXiv:2605.11010v1 Announce Type: new Abstract: Federated Learning has emerged as a transformative paradigm for collaborative machine learning across distributed environments. However, its performance is strongly influenced by the aggregation strategy used to combine local model updates at the server, which directly affects learning performance, robustness, and system behavior. This work presents a comprehensive experimental comparison of widely used federated aggregation strategies under both homogeneous and heterogeneous data distributions. Using benchmark image classification datasets, we analyze how different aggregation mechanisms respond to varying degrees of data heterogeneity, examining their impact on centralized accuracy and loss, and system-level efficiency metrics, including aggregation, training, and communication time. The results demonstrate that aggregation strategies exhibit distinct trade-offs across datasets and data distributions, with their effectiveness varying according to dataset characteristics and operating conditions.
Original Article
View Cached Full Text
Cached at: 05/13/26, 06:27 AM
# A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions
Source: [https://arxiv.org/html/2605.11010](https://arxiv.org/html/2605.11010)
###### Abstract

Federated Learning has emerged as a transformative paradigm for collaborative machine learning across distributed environments\. However, its performance is strongly influenced by the aggregation strategy used to combine local model updates at the server, which directly affects learning performance, robustness, and system behavior\. This work presents a comprehensive experimental comparison of widely used federated aggregation strategies under both homogeneous and heterogeneous data distributions\. Using benchmark image classification datasets, we analyze how different aggregation mechanisms respond to varying degrees of data heterogeneity, examining their impact on centralized accuracy and loss, and system\-level efficiency metrics, including aggregation, training, and communication time\. The results demonstrate that aggregation strategies exhibit distinct trade\-offs across datasets and data distributions, with their effectiveness varying according to dataset characteristics and operating conditions\.

## IIntroduction

Traditional machine learning \(ML\) approaches struggle to meet the increasing demands of modern large\-scale and data\-intensive applications, particularly in scenarios where data are distributed across multiple devices and subject to privacy constraints\. Centralized learning, where data are aggregated at a central server for model training, can achieve high predictive performance but it introduces significant communication overhead, raises privacy concerns, and may violate data protection regulations\. In contrast, distributed on\-device learning avoids data transfer to a central authority by enabling local model training, but the lack of collaboration among devices often limits generalization capability and results in suboptimal performance\[[14](https://arxiv.org/html/2605.11010#bib.bib44),[21](https://arxiv.org/html/2605.11010#bib.bib51)\]\.

To overcome these limitations, Federated Learning \(FL\) has emerged as a cutting\-edge ML paradigm, enabling collaborative model training across decentralized clients through iterative communication rounds\. This approach offers increased efficiency for training ML models on large\-scale datasets, which would be infeasible to process on a single machine\. A central server initializes a global model and distributes it to a set of participating clients\. Each client trains the model locally on its private data and transmits only the resulting model updates to the server\. The server then aggregates these updates to refine the global model, which is redistributed to the clients in successive communication rounds until convergence\. By keeping raw data localized, FL enhances data privacy while also reducing communication costs\[[23](https://arxiv.org/html/2605.11010#bib.bib1)\]\.

A critical challenge in FL lies in the aggregation of client models’ updates into a global model that generalizes well on new data, regardless of the diversity of the participant clients\[[40](https://arxiv.org/html/2605.11010#bib.bib50)\]\. The choice of aggregation strategy significantly influences FL performance, affecting not only model accuracy but also convergence behavior, robustness to data heterogeneity, privacy preservation, computational efficiency, and communication overhead\. Existing aggregation methods range from simple averaging techniques to more advanced approaches that incorporate momentum, adaptive optimization, and robustness to outliers, among others\.

In this work, we conduct a comprehensive comparative study of widely used and state\-of\-the\-art FL aggregation strategies, including FedAvg, FedAvgM, FedAdam, FedAdagrad, FedMedian, FedProx, and Server\-side differential privacy with adaptive clipping \(DP\)\. The strategies are evaluated under both homogeneous \(IID\) and heterogeneous \(non\-IID\) data distributions across three benchmark datasets: MNIST, FMNIST, and CIFAR\-10\. Performance is assessed using learning\-related metrics, including centralized accuracy and loss, as well as system\-efficiency metrics such as aggregation time per round, training time per round, and communication time per round\. Through this analysis, we aim to provide insights into the trade\-offs between accuracy and system efficiency across different aggregation strategies and data distributions\. The results indicate that no single aggregation strategy dominates across all scenarios and emphasize that the choice of aggregation strategy depends on dataset complexity, the degree of data heterogeneity and system and privacy requirements, rather than a one\-size\-fits\-all solution\.

The remainder of the paper is organized as follows\. Section[II](https://arxiv.org/html/2605.11010#S2)reviews related work on FL aggregation strategies\. Section[III](https://arxiv.org/html/2605.11010#S3)presents the aggregation methods examined in this work\. Section[IV](https://arxiv.org/html/2605.11010#S4)describes the experimental setup and reports the experimental results\. Finally, Section[V](https://arxiv.org/html/2605.11010#S5)concludes with main insights and outlines directions for future work\.

## IIRelated Work

One of the main challenges in FL lies in constructing a global model from local models’ updates that generalizes well\. Aggregation strategies are central to FL, as they dictate how local updates from distributed clients are integrated to update the global model\. Based on the existing literature, aggregation strategies in FL can be classified into three main categories according to their primary focus:heterogeneity and personalization,communication efficiency and optimization, andsecurity and privacy\.

As heterogeneity is a significant challenge in practical deployments of FL, aggregation strategies that effectively handle various forms of heterogeneity, while ensuring that the global model captures meaningful patterns from all participating clients, is of utmost importance\. The approaches under this category can be classified into three distinct classes, model\-oriented, aggregation process\-oriented and client\-oriented\. Model\-oriented strategies aim to enhance personalization by adjusting the architectures of global and local models\. Examples of these strategies include parameter decoupling, which mitigates heterogeneity by enabling personalized model learning through partitioning model parameters into independently optimized subsets, often using a layer\-wise decomposition\[[2](https://arxiv.org/html/2605.11010#bib.bib16)\]; global\-local model combination, which maintains both a collaboratively trained global model and a client\-specific local model for personalization\[[6](https://arxiv.org/html/2605.11010#bib.bib17)\]; and model split, which decomposes the model into sub\-models or branches to reduce computation and communication per client\[[8](https://arxiv.org/html/2605.11010#bib.bib19)\]\. Aggregation process\-oriented strategies focus on optimizing various aspects of aggregation process, including training hyperparameters, loss formulations, gradient variability, convergence behavior, and learning directions\. The overall objective is to implement aggregation mechanisms that accelerates FL convergence while adapting to the diverse data distributions and system characteristics of individual clients\. This includes server optimization through adaptive optimizers based on aggregated gradients\[[29](https://arxiv.org/html/2605.11010#bib.bib9)\]; regularization to mitigate client drift and prevent overfitting\[[20](https://arxiv.org/html/2605.11010#bib.bib21)\]; and hyperparameter optimization that adjusts factors such as client selection, number of local training steps, and aggregation frequency to balance convergence speed and system efficiency\[[7](https://arxiv.org/html/2605.11010#bib.bib22)\]\. Client\-oriented strategies focus on enhancing aggregation effectiveness by prioritizing the participation of reliable clients that possess high\-quality data and sufficient learning capabilities\. Examples of this strategies include weighted aggregation which assigns importance weights to client updates, improving convergence under non\-IID conditions\[[13](https://arxiv.org/html/2605.11010#bib.bib11)\]; client selection which carefully chooses subsets of clients for each round based on data quality, computational capability, or hierarchical structuring\[[9](https://arxiv.org/html/2605.11010#bib.bib24)\]\.

Efficient communication is a critical aspect in FL, often representing a major bottleneck\. To address this, a range of strategies has been proposed to reduce communication overhead and accelerate convergence\. Communication overhead arises when multiple clients transmit large volumes of data to the central server during model updates\. Existing solutions can be grouped into two main approaches: reducing training latency and adapting the network topology\. Training latency depends on both the computational capabilities and workload of client devices\. While hardware limitations are fixed, workload management offers opportunities to reduce training time and improve efficiency\. Representative strategies include load balancing\[[34](https://arxiv.org/html/2605.11010#bib.bib25)\], Over\-The\-Air \(OTA\) FL\[[37](https://arxiv.org/html/2605.11010#bib.bib26)\], and asynchronous aggregation\[[38](https://arxiv.org/html/2605.11010#bib.bib27)\]\. Network topology, which defines the structural arrangement of devices and their interconnections, also influences information flow\. Approaches such as hierarchical aggregation\[[33](https://arxiv.org/html/2605.11010#bib.bib28)\]and adaptive network topology\[[22](https://arxiv.org/html/2605.11010#bib.bib29)\]have been proposed to optimize this aspect\. Another important consideration is minimizing the costs associated with data transmission\. Factors such as network conditions, model size, and aggregation frequency can significantly affect transmission overhead\. Model size reduction is a common strategy, which decreases the number of parameters transmitted between clients and the server\. Techniques in this area include model division, compression\[[11](https://arxiv.org/html/2605.11010#bib.bib31)\], quantization\[[32](https://arxiv.org/html/2605.11010#bib.bib32)\], and sketching\[[30](https://arxiv.org/html/2605.11010#bib.bib30)\]\. Reducing the aggregation frequency is another effective approach, as many gradient updates are redundant, and transmitting large models repeatedly increases network load and convergence time\. Solutions include periodic aggregation\[[26](https://arxiv.org/html/2605.11010#bib.bib33)\]and fixed communication rounds\[[24](https://arxiv.org/html/2605.11010#bib.bib34)\]\.

Given the growing diversity and complexity of security and privacy threats in FL, a variety of mechanisms have been proposed to address these risks\. Client\-oriented approaches defend against aggregation attacks by analyzing elements of the FL process, such as clients’ local updates and training rules\. Even without access to raw client data, the central aggregator can detect anomalies and mitigate their effects through reliability assessment mechanisms\. Representative solutions include anomaly detection\[[27](https://arxiv.org/html/2605.11010#bib.bib38)\], verification techniques\[[36](https://arxiv.org/html/2605.11010#bib.bib40)\], adversarial training\[[12](https://arxiv.org/html/2605.11010#bib.bib41)\], and federated distillation\[[31](https://arxiv.org/html/2605.11010#bib.bib42)\]\. Aggregation process\-oriented approaches aim to build a resilient pipeline capable of withstanding communication failures, client dropouts, and malicious behavior, primarily through robust and secure aggregation techniques\[[5](https://arxiv.org/html/2605.11010#bib.bib53)\]\. FL relies on aggregating model updates provided by participating clients/devices, with aggregation typically designed to preserve privacy\. However, a key vulnerability of this process lies in its sensitivity to corrupted updates, whether introduced intentionally by adversaries or unintentionally due to failures in low\-cost hardware\[[15](https://arxiv.org/html/2605.11010#bib.bib52)\]\. The most prevalent approach to mitigating security attacks on the federated model involves employing estimators that are more robust to outliers or extreme values than the conventional mean\. The commonly used arithmetic mean for aggregation lacks robustness, as even a single corrupted update in a given round can significantly degrade the performance of the global model across all devices\[[28](https://arxiv.org/html/2605.11010#bib.bib43)\]\. More specifically, the traditional approach in FL aggregates local model parameters using the FedAvg algorithm\[[23](https://arxiv.org/html/2605.11010#bib.bib1)\]\. While this method performs well under theoretical conditions, it is known to struggle with both system and statistical heterogeneity upon scaling\[[19](https://arxiv.org/html/2605.11010#bib.bib45)\]\. To address these limitations, numerous aggregation operators based on more robust estimators have been proposed such as Median\[[39](https://arxiv.org/html/2605.11010#bib.bib10)\], Trimmed\-Mean\[[39](https://arxiv.org/html/2605.11010#bib.bib10)\], Krum and MultiKrum\[[4](https://arxiv.org/html/2605.11010#bib.bib46)\], Bulyan\[[10](https://arxiv.org/html/2605.11010#bib.bib48)\]and FedGreed\[[16](https://arxiv.org/html/2605.11010#bib.bib54)\]\.

## IIIFederated Learning Aggregation Strategies

FedAvg\[[23](https://arxiv.org/html/2605.11010#bib.bib1)\]is the standard aggregation method used in FL\. The central server computes a data\-size\-weighted, element\-wise average of model updates received from participating clients\. Each client’s contribution is proportional to the number of local data samples it holds, ensuring that the global update reflects the underlying data distribution\. This weighting scheme allows clients with larger datasets to have a stronger influence on the global model, which is particularly beneficial in scenarios where data volumes vary substantially across clients\. Due to its simplicity and effectiveness, FedAvg has become one of the most widely adopted aggregation strategies in FL\. However, FedAvg relies on a naive coordinate\-wise averaging procedure that can lead to sub\-optimal solutions\. Under non\-IID data, neurons in the same coordinate may be optimized for entirely different purposes due to clients’ unique specialization\. As a result, averaging neurons that diverge significantly in purpose can degrade the overall performance\. Furthermore, each communication round often requires extended local training phases for clients to re\-establish their specialized representations, reducing training efficiency\.

FedAvgM\(Federated Averaging with Momentum\)\[[13](https://arxiv.org/html/2605.11010#bib.bib11)\]is an extension of the standard FedAvg algorithm that integrates a momentum term at the server level during the aggregation process, drawing inspiration from momentum\-based stochastic gradient descent\. In this approach, the server aggregates the local model updates and applies a momentum\-based update to the global model\. The momentum represents an accumulation of the gradient history and is updated at each round by integrating the previously stored momentum with the newly aggregated update\. By incorporating momentum at the server, FedAvgM mitigates the variability in the directions of client updates arising from stochastic variance across clients, thereby improving model stability and accelerating convergence compared to FedAvg\. However, FedAvgM requires careful tuning of the momentum coefficient and learning rate to avoid instability and ensure convergence, while the incorporation of server\-side momentum increases computational overhead and may reduce robustness under extreme heterogeneous data distributions or adversarial client behavior\.

FedAdam\[[29](https://arxiv.org/html/2605.11010#bib.bib9)\]adapts the Adam optimization technique to the FL paradigm\. The model weights are updated adaptively by utilizing moving averages and adjusting the learning rate for each weight\. FedAdam adjusts learning rates based on the first and second moments of the gradients, leading to faster convergence and improved performance in heterogeneous data environments\. The algorithm applies the Adam optimizer at the server, after averaging client updates, rather than relying solely on simple averaging as in FedAvg\. This enables the global model to accommodate newly observed data while still retaining what has been learned in earlier rounds\. However, the adaptive nature of the algorithms introduces additional complexity, necessitating careful hyperparameter configuration and stability monitoring\.

FedAdagrad\(Federated Adaptive Gradient\)\[[29](https://arxiv.org/html/2605.11010#bib.bib9)\]adapts the Adagrad optimizer to the FL paradigm\. FedAdagrad belongs to a class of adaptive federated optimizers, including FedAdam, which employ server\-side adaptive optimization to enhance convergence and stability in heterogeneous data settings\. The Adagrad update rule is applied on the server side, where the server accumulates the squared gradients from client updates to adaptively adjust the learning rate for each parameter during each communication round\. Local model updates are computed similarly to FedAvg using stochastic gradient descent at the clients\. The server interprets the aggregated client updates as pseudo\-gradients and uses them to perform adaptive global model updates, improving convergence stability and performance\. As the denominator in the scaling coefficient grows with the sum of squared gradients, it induces an annealing effect that progressively reduces the learning rate, thereby promoting stable convergence\. Although FedAdagrad and related adaptive optimization methods can mitigate the negative effects of client drift during aggregation, they do not explicitly address parameter drift arising during local client training\.

FedMedian\[[39](https://arxiv.org/html/2605.11010#bib.bib10)\]is a robust variant of FedAvg that replaces the standard averaging of client model updates with a median\-based aggregation\. Local model updates may include outliers or even be malicious from adversarial clients\. By computing the element\-wise median of all received local updates instead of the average, FedMedian reduces the impact of extreme or corrupted values\. This enhances robustness against anomalous or unreliable client contributions, making FedMedian suitable for settings where some participants may provide noisy or untrustworthy updates\. FedMedian can manage heterogeneous data distributions more effectively than FedAvg, as it is less influenced by skewed or anomalous client updates\. However, it may converge more slowly than algorithms with explicit regularization, such as FedProx, due to the lack of additional constraints on the update process\.

FedProx\[[19](https://arxiv.org/html/2605.11010#bib.bib45)\]generalizes the FedAvg algorithm and it is designed to address both data and system heterogeneity in FL environments\. In FedProx, participants optimize the loss function with a proximal regularization term, which penalizes large divergence between the current local model and the previous global model, thereby constraining local updates and ensuring they remain close to the global objective\. The proximal term effectively reduces client drift, however it introduces additional computational overhead\.

Server\-side Differential Privacy with Adaptive Clipping \(DP\)\[[1](https://arxiv.org/html/2605.11010#bib.bib12)\]\. Differential privacy plays a critical role in FL by protecting the privacy of client data during collaborative model training\. In the central DP, the server is responsible for safeguarding client information by adding noise to the aggregated global model parameters\. In this work, we adopted server\-side differential privacy with adaptive clipping, a method in which noise is added after aggregating client updates, built on top of the FedAvg\. The clipping threshold dynamically adjusts based on the observed update distribution\. More specifically, the clipping value is tuned during the rounds with respect to the quantile of the update norm distribution\. Applying clipping at the server side allows for uniform control over all client updates and reduces communication overhead\. However, it increases computational demands on the server since all client updates must be processed centrally\.

## IVExperimental Evaluation

### IV\-AExperimental Setup

The comparison of the different aggregation strategies was conducted by simulating a FL environment consisting of a central server and 10 clients collaboratively performing multilabel image classification\. In each communication round, all clients participated in the training process\. The simulations were implemented using the Flower framework111https://flower\.ai/, running over 25 communication rounds, with each simulation evaluating a specific aggregation strategy\. The evaluation leverages benchmark datasets, including CIFAR\-10\[[17](https://arxiv.org/html/2605.11010#bib.bib4)\], FMNIST\[[35](https://arxiv.org/html/2605.11010#bib.bib5)\], and MNIST\[[18](https://arxiv.org/html/2605.11010#bib.bib6)\]\. Two convolutional neural networks \(CNNs\), as defined in the Flower FL framework repository\[[3](https://arxiv.org/html/2605.11010#bib.bib7)\], were employed for model training: one tailored for CIFAR\-10 and another compatible with both MNIST and FMNIST\. At the local client\-level training, both SGD and Adam optimizers were investigated using their default hyperparameter configurations\. Experimental results indicate that Adam achieves slightly higher accuracy than SGD, consistent with prior studies\[[25](https://arxiv.org/html/2605.11010#bib.bib13),[29](https://arxiv.org/html/2605.11010#bib.bib9)\]that report its superior convergence properties and overall optimization efficacy\. Accordingly, Adam was selected as the preferred optimizer for client\-side training\. The aggregation strategies were evaluated under both homogeneous \(IID\) and heterogeneous \(non\-IID\) data distributions\. For heterogeneous settings, datasets were partitioned across clients according to a Dirichlet distribution\. Figure[1](https://arxiv.org/html/2605.11010#S4.F1)illustrates the heterogeneous distribution of the CIFAR\-10 dataset among the 10 clients in our FL setup\. In the figure, each circle represents the number of samples of a given class assigned to a client, with larger circles indicating a greater number of samples for that class\. In the experiments, a single Dirichlet partitioning scenario with concentration parameterα=0\.5\\alpha=0\.5\(moderately skewed\) was employed\. The valueα\\alphawas selected to model a moderately heterogeneous scenario, balancing class imbalance without extreme client isolation\. The MNIST and FMNIST datasets exhibit similar heterogeneous distributions across clients\.

![Refer to caption](https://arxiv.org/html/2605.11010v1/cifar10-alpha=0.5.png)Figure 1:Data heterogeneity in the 10\-client FL setup on CIFAR\-10 using Dirichlet partitioning withα=0\.5\\alpha=0\.5\(moderately skewed\)\.To evaluate the performance of the various aggregation strategies, multiple metrics were considered, capturing both learning performance and system efficiency, including computational and communication aspects\. More specifically:

- •Centralized accuracy \(AccAcc\): measures the proportion of correct predictions out of the total number of predictions on the evaluation set at the end of each communication round\.
- •Centralized loss \(LossLoss\): evaluates the model’s error on the aggregated evaluation set after each communication round\.
- •Aggregation time per round \(AggTimeAggTime\): measures the duration required by the server to combine client updates during each communication round\.
- •Training time per round \(TrainTimeTrainTime\): measures the time from when the server distributes the model to the clients until the clients complete their local training\.
- •Communication time per round \(CommTimeCommTime\): accounts for the time required to transfer model parameters between the server and clients, including both server\-to\-client and client\-to\-server communications\.

TABLE I:Summary of results of aggregation strategies across datasets under IID and non\-IID data distributions \(10 Clients / 25 Communication Rounds\)
### IV\-BExperimental Results

Table[I](https://arxiv.org/html/2605.11010#S4.T1)summarizes the experimental results of the examined FL aggregation strategies across MNIST, FMNIST, and CIFAR\-10 datasets under both IID and non\-IID data distributions, using 10 clients over 25 communication rounds\. Higher centralized accuracy and lower loss values indicate improved learning performance, while lower aggregation, training, and communication times reflect better computational and communication efficiency\. The reported results correspond to the mean values obtained from three independent FL simulations\.

Across all datasets, data heterogeneity \(non\-IID\) consistently presents a negative impact on model performance, with all aggregation strategies exhibiting reduced accuracy and increased loss compared to the IID setting\. This degradation is more noticeable for CIFAR\-10 dataset, which represents a more complex classification task compared to the other datasets\. Figure[2](https://arxiv.org/html/2605.11010#S4.F2)illustrates the centralized accuracy achieved by all examined aggregation strategies across the MNIST, FMNIST, and CIFAR\-10 datasets under IID and non\-IID data distributions\. The figure highlights a consistent performance degradation when moving from IID to non\-IID settings for all strategies, with the effect becoming more pronounced as dataset complexity increases\. Regarding system efficiency metrics, aggregation and communication times remain relatively stable across IID and non\-IID settings, indicating that the observed differences are primarily attributable to learning dynamics rather than computational or communication overhead\. Training time, on the other hand, exhibits modest variations driven mainly by dataset complexity and model characteristics\.

![Refer to caption](https://arxiv.org/html/2605.11010v1/x1.png)Figure 2:Centralized accuracy across datasets for all aggregation strategies under IID and non\-IID data distributions\.As the results indicate, Server\-side Differential Privacy with Adaptive Clipping \(DP\) aggregation strategy exhibits substantially lower accuracy across all datasets and data distributions in the examined setup\. The noise introduced to ensure privacy significantly degrades the learning signal, limiting the model’s ability to converge effectively\. In addition, DP incurs the highest aggregation time among the evaluated methods due to clipping and noise injection operations at the server\. These results highlight the inherent trade\-off between privacy guarantees and model utility in FL\. FedAvg, the standard aggregation method in FL, serves as the baseline across all datasets\. As the results indicate, it achieves high performance on the MNIST and FMNIST datasets, exhibiting high accuracy and relatively low loss\. However, its performance degrades on CIFAR\-10, a trend also observed across all examined aggregation strategies\. Additionally, it maintains consistently low aggregation and communication times, comparable to other aggregation methods \(e\.g\., FedAvgM, FedProx\), highlighting its computational efficiency\. Training time remains comparable to that of other methods and is primarily influenced by dataset and model characteristics\. FedAvgM extends FedAvg by incorporating server\-side momentum\. Under IID conditions, it exhibits high accuracy, indicating faster convergence due to momentum accumulation\. However, under non\-IID settings, FedAvgM experiences notable performance degradation, especially on MNIST, where accuracy drops substantially\. This behavior suggests that momentum can amplify biased gradient directions when client updates are skewed, leading to unstable convergence\. In terms of system efficiency metrics, FedAvgM exhibits similar results to FedAvg\. FedAdam achieves the highest accuracy on MNIST under both IID and non\-IID settings, demonstrating the effectiveness of adaptive learning rates in stabilizing optimization under moderate heterogeneity\. However, its performance deteriorates substantially on CIFAR\-10, which may indicate sensitivity to increased dataset complexity\. FedAdagrad follows a similar trend but exhibits lower accuracy on FMNIST and CIFAR\-10 when compared to the other strategies, as the learning\-rate decay inherent to Adagrad limits its ability to adapt to more complex tasks\. Both methods incur slightly higher aggregation times than the basic averaging baselines in most settings, due to additional server\-side computations, though the overhead remains modest, while training and communication times remain comparable\. FedMedian demonstrates the highest accuracy on FMNIST under both IID and non\-IID settings\. Its median\-based aggregation effectively mitigates the influence of extreme or skewed client updates, resulting in stable performance\. The primary trade\-off of this approach is the increased aggregation time due to sorting operations\. However, as the results indicate, this additional overhead remains limited and does not impact overall system efficiency in the examined setup\. FedProx exhibits stable and consistent performance across datasets and data distributions, positioning it as a middle\-ground approach in terms of accuracy across the examined settings\. The use of a proximal regularization term constrains local updates and limits excessive client drift, resulting in stable convergence, particularly on the CIFAR\-10 dataset\. In terms of system efficiency, overall aggregation and communication costs remain comparable to those of other methods\.

From an efficiency perspective, aggregation time remains low for all strategies and does not constitute a bottleneck in the examined setup\. FedAvg and FedAvgM exhibit the lowest aggregation overhead due to their simple averaging mechanisms, while FedMedian and DP incur higher aggregation times as a result of sorting operations and noise injection, respectively\. Training time per round is dominated by client\-side computation and varies primarily with dataset and model characteristics rather than the aggregation strategy itself, with no consistent training\-time overhead observed across strategies\. However, FedMedian exhibiting slightly higher training times in some settings\. Communication time remains consistently low across all strategies, indicating that differences in aggregation logic have minimal impact on communication overhead\.

To assess the consistency of the aggregation strategies under a larger experimental configuration, additional experiments were conducted by increasing the number of participating clients to 20 and the number of communication rounds to 50, focusing on the MNIST dataset\. Table[II](https://arxiv.org/html/2605.11010#S4.T2)reports the corresponding results and enables a direct comparison with the baseline setup of 10 clients and 25 communication rounds\. Overall, the results remain consistent with the trends observed in Table[I](https://arxiv.org/html/2605.11010#S4.T1)\. Increasing the number of clients and communication rounds does not alter the relative performance of the aggregation strategies; instead, it confirm the characteristics identified in the smaller\-scale setup\. In terms of learning performance, adaptive aggregation strategies continue to exhibit better results\. FedAdam and FedAdagrad achieve the highest accuracy under both IID and non\-IID settings, with FedAdam consistently outperforming other strategies\. Notably, the performance difference between IID and non\-IID settings is reduced compared to the smaller\-scale setup, suggesting that additional communication rounds allow adaptive methods to better compensate for data heterogeneity through more stable global optimization\. FedMedian preserves high accuracy under non\-IID conditions, demonstrating that its robustness to skewed client updates extends to scenarios with a larger number of participants\. Similarly, FedProx continues to exhibit stable performance across IID and non\-IID settings, confirming that proximal regularization effectively mitigates client drift even as the system scales\. FedAvgM shows mixed behavior\. While it achieves competitive accuracy under IID conditions, its performance under non\-IID settings remains inferior to adaptive and regularization\-based methods\. This observation is consistent with earlier results and further supports the conclusion that server\-side momentum can amplify biased updates in heterogeneous environments, particularly as the number of clients increases\. The DP aggregation strategy continues to exhibit substantially lower accuracy compared to the other methods, despite the increased number of rounds\. This indicates that, under the examined configuration, the additional optimization steps are insufficient to fully offset the performance degradation introduced by noise injection, reinforcing the trade\-off between privacy guarantees and model utility\. From an efficiency perspective, aggregation time increases when scaling from 10 to 20 clients, as expected, due to the larger number of client updates processed at the server\. Nevertheless, aggregation time remains relatively low across all strategies and does not represent a bottleneck\. Training time and communication time remain consistently low and largely unaffected by the increase in clients and rounds\.

TABLE II:Comparison of experimental results on MNIST for two configurations: Large\-scale vs\. Small\-scale setup

## VConclusions

This work presented a comparative experimental evaluation of FL aggregation strategies under homogeneous and heterogeneous data distributions across multiple benchmark datasets\. Adaptive methods such as FedAdam achieve superior accuracy in certain settings, particularly under moderate heterogeneity, but may struggle on more complex datasets\. Robust and regularization\-based approaches \(FedMedian, FedProx\) provide improved stability under non\-IID conditions at the cost of additional aggregation overhead, while simple averaging schemes \(FedAvg, FedAvgM\) remain computationally efficient but sensitive to data heterogeneity\. Privacy\-preserving aggregation \(DP\) introduces further trade\-offs, significantly impacting model utility\. These findings indicate that the choice of aggregation strategy in FL should be guided by dataset characteristics, heterogeneity levels, privacy requirements, and system constraints rather than relying on a universal, one\-size\-fits\-all solution\. As future work, we plan to explore more advanced robust aggregation mechanisms, such as Krum Multi\-Krum, and evaluate aggregation strategies in more realistic FL deployments\.

## Acknowledgment

This paper has received funding from the European Union’s Horizon Europe research and innovation actions under grant agreement No 101168560 \(CoEvolution\)\. Views and opinions expressed are however those of the author\(s\) only and do not necessarily reflect those of the European Union or the Commission\. Neither the European Union nor the granting authority can be held responsible for them\.

## References

- \[1\]G\. Andrew, O\. Thakkar, B\. McMahan, and S\. Ramaswamy\(2021\)Differentially private learning with adaptive clipping\.Advances in Neural Information Processing Systems34,pp\. 17455–17466\.Cited by:[§III](https://arxiv.org/html/2605.11010#S3.p7.1)\.
- \[2\]M\. G\. Arivazhagan, V\. Aggarwal, A\. K\. Singh, and S\. Choudhary\(2019\)Federated learning with personalization layers\.arXiv preprint arXiv:1912\.00818\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1)\.
- \[3\]D\. J\. Beutel, T\. Topal, A\. Mathur, X\. Qiu, J\. Fernandez\-Marques, Y\. Gao, L\. Sani, H\. L\. Kwing, T\. Parcollet, P\. P\. d\. Gusmão, and N\. D\. Lane\(2020\)Flower: a friendly federated learning research framework\.arXiv preprint arXiv:2007\.14390\.Cited by:[§IV\-A](https://arxiv.org/html/2605.11010#S4.SS1.p1.2)\.
- \[4\]P\. Blanchard, E\. M\. El Mhamdi, R\. Guerraoui, and J\. Stainer\(2017\)Machine learning with adversaries: byzantine tolerant gradient descent\.Advances in neural information processing systems30\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[5\]K\. Bonawitz, V\. Ivanov, B\. Kreuter, A\. Marcedone, H\. B\. McMahan, S\. Patel, D\. Ramage, A\. Segal, and K\. Seth\(2016\)Practical secure aggregation for federated learning on user\-held data\.arXiv preprint arXiv:1611\.04482\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[6\]M\. F\. Criado, F\. E\. Casado, R\. Iglesias, C\. V\. Regueiro, and S\. Barro\(2022\)Non\-iid data and continual learning processes in federated learning: a long road ahead\.Information Fusion88,pp\. 263–280\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1)\.
- \[7\]J\. Dollinger, M\. Zghal,et al\.\(2024\)Hyperparameter impact on computational efficiency in federated edge learning\.In2024 International Wireless Communications and Mobile Computing \(IWCMC\),pp\. 0849–0854\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1)\.
- \[8\]C\. Dun, M\. Hipolito, C\. Jermaine, D\. Dimitriadis, and A\. Kyrillidis\(2023\)Efficient and light\-weight federated learning via asynchronous distributed dropout\.InInternational Conference on Artificial Intelligence and Statistics,pp\. 6630–6660\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1)\.
- \[9\]L\. Fu, H\. Zhang, G\. Gao, M\. Zhang, and X\. Liu\(2023\)Client selection in federated learning: principles, challenges, and opportunities\.IEEE Internet of Things Journal10\(24\),pp\. 21811–21819\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1)\.
- \[10\]R\. Guerraoui, S\. Rouault,et al\.\(2018\)The hidden vulnerability of distributed learning in byzantium\.InInternational conference on machine learning,pp\. 3521–3530\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[11\]F\. Haddadpour, M\. M\. Kamani, A\. Mokhtari, and M\. Mahdavi\(2021\)Federated learning with compression: unified analysis and sharp guarantees\.InInternational Conference on Artificial Intelligence and Statistics,pp\. 2350–2358\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[12\]E\. Hallaji, R\. Razavi\-Far, M\. Saif, and E\. Herrera\-Viedma\(2023\)Label noise analysis meets adversarial training: a defense against label poisoning in federated learning\.Knowledge\-based systems266,pp\. 110384\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[13\]T\. H\. Hsu, H\. Qi, and M\. Brown\(2019\)Measuring the effects of non\-identical data distribution for federated visual classification\.arXiv preprint arXiv:1909\.06335\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1),[§III](https://arxiv.org/html/2605.11010#S3.p2.1)\.
- \[14\]K\. Hu, S\. Gong, Q\. Zhang, C\. Seng, M\. Xia, and S\. Jiang\(2024\)An overview of implementing security and privacy in federated learning\.Artificial intelligence review57\(8\),pp\. 204\.Cited by:[§I](https://arxiv.org/html/2605.11010#S1.p1.1)\.
- \[15\]E\. Kritharakis, D\. Jakovetic, A\. Makris, and K\. Tserpes\(2025\)Robust federated learning under adversarial attacks via loss\-based client clustering\.arXiv preprint arXiv:2508\.12672\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[16\]E\. Kritharakis, A\. Makris, D\. Jakovetic, and K\. Tserpes\(2025\)Fedgreed: a byzantine\-robust loss\-based aggregation method for federated learning\.In2025 3rd International Conference on Federated Learning Technologies and Applications \(FLTA\),pp\. 348–355\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[17\]A\. Krizhevsky, G\. Hinton,et al\.\(2009\)Learning multiple layers of features from tiny images\.Technical reportUniversity of Toronto\.Cited by:[§IV\-A](https://arxiv.org/html/2605.11010#S4.SS1.p1.2)\.
- \[18\]Y\. LeCun\(2010\)MNIST handwritten digit database\.Note:http://yann\.lecun\.com/exdb/mnist/AT&T LabsCited by:[§IV\-A](https://arxiv.org/html/2605.11010#S4.SS1.p1.2)\.
- \[19\]T\. Li, A\. K\. Sahu, A\. Talwalkar, and V\. Smith\(2020\)Federated learning: challenges, methods, and future directions\.IEEE signal processing magazine37\(3\),pp\. 50–60\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1),[§III](https://arxiv.org/html/2605.11010#S3.p6.1)\.
- \[20\]X\. Li, M\. Liu, S\. Sun, Y\. Wang, H\. Jiang, and X\. Jiang\(2023\)Fedtrip: a resource\-efficient federated learning method with triplet regularization\.In2023 IEEE International Parallel and Distributed Processing Symposium \(IPDPS\),pp\. 809–819\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1)\.
- \[21\]A\. Makris, A\. Fournaris, A\. Aghaie, I\. Arakas, A\. M\. Anaxagorou, I\. Arapakis, D\. Bacciu, B\. Biggio, G\. Bouloukakis, S\. Bouras,et al\.\(2025\)CoEvolution: a comprehensive trustworthy framework for connected machine learning and secure interconnected ai solutions\.In2025 IEEE International Conference on Cyber Security and Resilience \(CSR\),pp\. 838–845\.Cited by:[§I](https://arxiv.org/html/2605.11010#S1.p1.1)\.
- \[22\]O\. Marfoq, C\. Xu, G\. Neglia, and R\. Vidal\(2020\)Throughput\-optimal topology design for cross\-silo federated learning\.Advances in Neural Information Processing Systems33,pp\. 19478–19487\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[23\]H\. B\. McMahan, E\. Moore, D\. Ramage, S\. Hampson, and B\. A\. y Arcas\(2017\)Communication\-efficient learning of deep networks from decentralized data\.InProceedings of the 20th International Conference on Artificial Intelligence and Statistics,pp\. 1273–1282\.Cited by:[§I](https://arxiv.org/html/2605.11010#S1.p2.1),[§II](https://arxiv.org/html/2605.11010#S2.p4.1),[§III](https://arxiv.org/html/2605.11010#S3.p1.1)\.
- \[24\]N\. Mhaisen, A\. A\. Abdellatif, A\. Mohamed, A\. Erbad, and M\. Guizani\(2021\)Optimal user\-edge assignment in hierarchical federated learning based on statistical properties and network topology constraints\.IEEE Transactions on Network Science and Engineering9\(1\),pp\. 55–66\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[25\]J\. Mills, J\. Hu, and G\. Min\(2021\)Multi\-task federated learning for personalised deep neural networks in edge computing\.IEEE Transactions on Parallel and Distributed Systems33\(3\),pp\. 630–641\.Cited by:[§IV\-A](https://arxiv.org/html/2605.11010#S4.SS1.p1.2)\.
- \[26\]N\. Mohammadi, J\. Bai, Q\. Fan, Y\. Song, Y\. Yi, and L\. Liu\(2021\)Differential privacy meets federated learning under communication constraints\.IEEE Internet of Things Journal9\(22\),pp\. 22204–22219\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[27\]J\. Park, D\. Han, M\. Choi, and J\. Moon\(2021\)Sageflow: robust federated learning against both stragglers and adversaries\.Advances in neural information processing systems34,pp\. 840–851\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[28\]K\. Pillutla, S\. M\. Kakade, and Z\. Harchaoui\(2022\)Robust aggregation for federated learning\.IEEE Transactions on Signal Processing70,pp\. 1142–1154\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[29\]S\. Reddi, Z\. Charles, M\. Zaheer, Z\. Garrett, K\. Rush, J\. Konečnỳ, S\. Kumar, and H\. B\. McMahan\(2020\)Adaptive federated optimization\.arXiv preprint arXiv:2003\.00295\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p2.1),[§III](https://arxiv.org/html/2605.11010#S3.p3.1),[§III](https://arxiv.org/html/2605.11010#S3.p4.1),[§IV\-A](https://arxiv.org/html/2605.11010#S4.SS1.p1.2)\.
- \[30\]D\. Rothchild, A\. Panda, E\. Ullah, N\. Ivkin, I\. Stoica, V\. Braverman, J\. Gonzalez, and R\. Arora\(2020\)Fetchsgd: communication\-efficient federated learning with sketching\.InInternational Conference on Machine Learning,pp\. 8253–8265\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[31\]H\. Seo, J\. Park, S\. Oh, M\. Bennis, and S\. Kim\(2022\)16 federated knowledge distillation\.Machine Learning and Wireless Communications457\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[32\]N\. Shlezinger, M\. Chen, Y\. C\. Eldar, H\. V\. Poor, and S\. Cui\(2020\)UVeQFed: universal vector quantization for federated learning\.IEEE Transactions on Signal Processing69,pp\. 500–514\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[33\]Z\. Su, Y\. Wang, T\. H\. Luan, N\. Zhang, F\. Li, T\. Chen, and H\. Cao\(2021\)Secure and efficient federated learning for smart grid with edge\-cloud collaboration\.IEEE Transactions on Industrial Informatics18\(2\),pp\. 1333–1344\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[34\]S\. Wang, Y\. Ruan, Y\. Tu, S\. Wagle, C\. G\. Brinton, and C\. Joe\-Wong\(2021\)Network\-aware optimization of distributed learning for fog computing\.IEEE/ACM Transactions on Networking29\(5\),pp\. 2019–2032\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[35\]H\. Xiao, K\. Rasul, and R\. Vollgraf\(2017\)Fashion\-mnist: a novel image dataset for benchmarking machine learning algorithms\.arXiv preprint arXiv:1708\.07747\.Cited by:[§IV\-A](https://arxiv.org/html/2605.11010#S4.SS1.p1.2)\.
- \[36\]Z\. Xing, Z\. Zhang, M\. Li, J\. Liu, L\. Zhu, G\. Russello, and M\. R\. Asghar\(2023\)Zero\-knowledge proof\-based practical federated learning on blockchain\.arXiv preprint arXiv:2304\.05590\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1)\.
- \[37\]H\. Yang, P\. Qiu, J\. Liu, and A\. Yener\(2022\)Over\-the\-air federated learning with joint adaptive computation and power control\.In2022 IEEE International Symposium on Information Theory \(ISIT\),pp\. 1259–1264\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[38\]Z\. Yang, X\. Zhang, D\. Wu, R\. Wang, P\. Zhang, and Y\. Wu\(2022\)Efficient asynchronous federated learning research in the internet of vehicles\.IEEE Internet of Things Journal10\(9\),pp\. 7737–7748\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p3.1)\.
- \[39\]D\. Yin, Y\. Chen, R\. Kannan, and P\. Bartlett\(2018\)Byzantine\-robust distributed learning: towards optimal statistical rates\.InInternational conference on machine learning,pp\. 5650–5659\.Cited by:[§II](https://arxiv.org/html/2605.11010#S2.p4.1),[§III](https://arxiv.org/html/2605.11010#S3.p5.1)\.
- \[40\]C\. Zhang, Y\. Xie, H\. Bai, B\. Yu, W\. Li, and Y\. Gao\(2021\)A survey on federated learning\.Knowledge\-Based Systems216,pp\. 106775\.Cited by:[§I](https://arxiv.org/html/2605.11010#S1.p3.1)\.
A Comparative Study of Federated Learning Aggregation Strategies under Homogeneous and Heterogeneous Data Distributions

Similar Articles

On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach

Federated Learning of Spiking Neural Networks under Heterogeneous Temporal Resolutions

Federated Learning

Towards Serverless Semi-Decentralized Federated Learning with Heterogeneous Optimizers

HASA: Subnet Allocation for Compute-Constrained Model-Heterogeneous Federated Learning

Submit Feedback

Similar Articles

On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach
Federated Learning of Spiking Neural Networks under Heterogeneous Temporal Resolutions
Towards Serverless Semi-Decentralized Federated Learning with Heterogeneous Optimizers
HASA: Subnet Allocation for Compute-Constrained Model-Heterogeneous Federated Learning