Adaptive Federated Optimization

Authors: Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, Hugh Brendan McMahan

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also perform extensive experiments on these methods and show that the use of adaptive optimizers can significantly improve the performance of federated learning.
Researcher Affiliation Industry Sashank J. Reddi , Zachary Charles , Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Koneˇcný, Sanjiv Kumar, H. Brendan Mc Mahan Google Research {sashank, zachcharles, manzilzaheer, zachgarrett, krush, konkey, sanjivk, mcmahan}@google.com
Pseudocode Yes Algorithm 1 FEDOPT FEDOPT FEDOPT 1: Input: x0, CLIENTOPT, SERVEROPT 2: for t = 0, , T 1 do 3: Sample a subset S of clients 4: xt i,0 = xt 5: for each client i S in parallel do 6: for k = 0, , K 1 do 7: Compute an unbiased estimate gt i,k of Fi(xt i,k) 8: xt i,k+1 = CLIENTOPT(xt i,k, gt i,k, ηl, t) 9: t i = xt i,K xt 10: t = 1 |S| P i S t i 11: xt+1 = SERVEROPT(xt, t, η, t)
Open Source Code Yes To encourage reproducibility and breadth of comparison, we have attempted to describe our experiments as rigorously as possible, and have created an open-source framework with all models, datasets, and code.
Open Datasets Yes We use five datasets: CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), EMNIST (Cohen et al., 2017), Shakespeare (Mc Mahan et al., 2017), and Stack Overflow (Authors, 2019).
Dataset Splits Yes For Stack Overflow tasks, the validation set contains 10,000 randomly sampled test examples (due to the size of the test dataset, see Table 2). For all other tasks, we use the entire test set.
Hardware Specification No The paper does not explicitly mention specific hardware details like GPU/CPU models, processors, or memory used for running the experiments.
Software Dependencies No We implement all algorithms in Tensor Flow Federated (Ingerman & Ostrowski, 2019).
Experiment Setup Yes We select ηl, η, and τ by grid-search tuning... We run 1500 rounds of training on the EMNIST CR, Shakespeare, and Stack Overflow tasks, 3000 rounds for EMNIST AE, and 4000 rounds for the CIFAR tasks. For SO NWP, we sample 50 clients per round, while for all other tasks we sample 10. We use E = 1 local epochs throughout.