FADAS: Towards Federated Adaptive Asynchronous Optimization

Authors: Yujia Wang, Shiqiang Wang, Songtao Lu, Jinghui Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments across various asynchronous delay settings in both vision and language modeling tasks. Our results indicate that the proposed FADAS, whether or not including the delay-adaptive learning rate, outperforms other asynchronous FL baselines. In particular, the delay-adaptive FADAS demonstrates significant advantages in scenarios with large worst-case delays. Moreover, our experimental results on simulating the wall-clock training time underscores the efficiency of our proposed FADAS approach.
Researcher Affiliation Collaboration 1College of Information Sciences and Technology, Pennsylvania State University, State College, PA, USA 2IBM T. J. Watson Research Center, Yorktown Heights, NY, USA.
Pseudocode Yes Algorithm 1 FADAS (with delay adaptation )
Open Source Code Yes Our code can be found at https://github.com/yujiaw98/FADAS.
Open Datasets Yes using the CIFAR-10/100 (Krizhevsky et al., 2009) datasets with Res Net-18 model (He et al., 2016) for vision tasks, and applying the pre-trained BERT base model (Devlin et al., 2018) for fine-tuning several datasets from the GLUE benchmark dataset (Wang et al., 2018) for language tasks.
Dataset Splits Yes For both settings, we partition the data on clients based on the Dirichlet distribution following Wang et al. (2020a;b), and the parameter α used in Dirichlet sampling determines the degree of data heterogeneity. We apply two levels of data heterogeneity with α = 0.1 and α = 0.3. ... Table 7 demonstrates the efficiency of FADAS and its delay-adaptive variant by comparing their performance with two synchronous FL methods in reaching the target validation accuracy across different dataset.
Hardware Specification No The paper mentions simulating wall-clock running times and setting up client scenarios, but it does not provide specific details about the hardware (e.g., GPU models, CPU models, memory, or specific cloud instances) used for running the experiments.
Software Dependencies No The paper refers to using ResNet-18 and BERT models and mentions SGD as a local optimizer. However, it does not specify any software names with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed for replication.
Experiment Setup Yes We summarize some crucial implementation details in the following, and we leave some additional results and experiment details to Appendix D. Our code can be found at https://github.com/yujiaw98/FADAS. Overview of vision tasks implementation. We set up a total of 100 clients for the mild delay scenario, in which the concurrency Mc = 20 and the buffer size M = 10 by default. ... Each client conducts two local epochs of training, and the mini-batch size is 50 for each client. The local optimizer for all methods is SGD with weight decay 10^-4, and we grid search the global and local learning rates individually for each method. ... For the global adaptive optimizer, we set β1 = 0.9, β1 = 0.99, and we set ϵ = 10^-8. Table 10 summarizes the hyper-parameter details in our experiments.