Federated Accelerated Stochastic Gradient Descent

Authors: Honglin Yuan, Tengyu Ma

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verify the efficiency of FEDAC in Section 5. Numerical results suggest a considerable improvement of FEDAC over all three baselines, namely FEDAVG, (distributed) Minibatch-SGD, and (distributed) Accelerated Minibatch-SGD [Dekel et al., 2012, Cotter et al., 2011], especially in the regime of highly infrequent communication and abundant workers. In this section, we validate our theory and demonstrate the efficiency of FEDAC via experiments.
Researcher Affiliation Academia Honglin Yuan Stanford University yuanhl@stanford.edu Tengyu Ma Stanford University tengyuma@stanford.edu
Pseudocode Yes Algorithm 1 Federated Accelerated Stochastic Gradient Descent (FEDAC)
Open Source Code Yes Code repository link: https://github.com/hongliny/Fed Ac-Neur IPS20.
Open Datasets Yes on 2-regularized logistic regression for UCI a9a dataset [Dua and Graff, 2017] from Lib SVM [Chang and Lin, 2011].
Dataset Splits No The paper mentions using the UCI a9a dataset but does not specify training, validation, or test splits (e.g., percentages, sample counts, or explicit standard split references) within the text.
Hardware Specification No The paper does not specify any hardware details (e.g., specific CPU/GPU models, cloud instances, or memory specifications) used for running the experiments. It refers generally to "distributed computing resources" and "abundant workers".
Software Dependencies No The paper mentions using data from "Lib SVM [Chang and Lin, 2011]" but does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, etc.) used to implement and run the experiments.
Experiment Setup Yes The regularization strength is set as 10 3. The hyperparameters (γ, , β) of FEDAC follows FEDAC-I where strong-convexity µ is chosen as regularization strength 10 3. We test the settings of M = 22, . . . , 213 workers and K = 20, . . . , 28 synchronization interval. For all four algorithms, we tune the learning-rate only from the same set of levels within [10 3, 10].