FADAS: Towards Federated Adaptive Asynchronous Optimization
Authors: Yujia Wang, Shiqiang Wang, Songtao Lu, Jinghui Chen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments across various asynchronous delay settings in both vision and language modeling tasks. Our results indicate that the proposed FADAS, whether or not including the delay-adaptive learning rate, outperforms other asynchronous FL baselines. In particular, the delay-adaptive FADAS demonstrates significant advantages in scenarios with large worst-case delays. Moreover, our experimental results on simulating the wall-clock training time underscores the efficiency of our proposed FADAS approach. |
| Researcher Affiliation | Collaboration | 1College of Information Sciences and Technology, Pennsylvania State University, State College, PA, USA 2IBM T. J. Watson Research Center, Yorktown Heights, NY, USA. |
| Pseudocode | Yes | Algorithm 1 FADAS (with delay adaptation ) |
| Open Source Code | Yes | Our code can be found at https://github.com/yujiaw98/FADAS. |
| Open Datasets | Yes | using the CIFAR-10/100 (Krizhevsky et al., 2009) datasets with Res Net-18 model (He et al., 2016) for vision tasks, and applying the pre-trained BERT base model (Devlin et al., 2018) for fine-tuning several datasets from the GLUE benchmark dataset (Wang et al., 2018) for language tasks. |
| Dataset Splits | Yes | For both settings, we partition the data on clients based on the Dirichlet distribution following Wang et al. (2020a;b), and the parameter α used in Dirichlet sampling determines the degree of data heterogeneity. We apply two levels of data heterogeneity with α = 0.1 and α = 0.3. ... Table 7 demonstrates the efficiency of FADAS and its delay-adaptive variant by comparing their performance with two synchronous FL methods in reaching the target validation accuracy across different dataset. |
| Hardware Specification | No | The paper mentions simulating wall-clock running times and setting up client scenarios, but it does not provide specific details about the hardware (e.g., GPU models, CPU models, memory, or specific cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper refers to using ResNet-18 and BERT models and mentions SGD as a local optimizer. However, it does not specify any software names with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed for replication. |
| Experiment Setup | Yes | We summarize some crucial implementation details in the following, and we leave some additional results and experiment details to Appendix D. Our code can be found at https://github.com/yujiaw98/FADAS. Overview of vision tasks implementation. We set up a total of 100 clients for the mild delay scenario, in which the concurrency Mc = 20 and the buffer size M = 10 by default. ... Each client conducts two local epochs of training, and the mini-batch size is 50 for each client. The local optimizer for all methods is SGD with weight decay 10^-4, and we grid search the global and local learning rates individually for each method. ... For the global adaptive optimizer, we set β1 = 0.9, β1 = 0.99, and we set ϵ = 10^-8. Table 10 summarizes the hyper-parameter details in our experiments. |