Communication-efficient SGD: From Local SGD to One-Shot Averaging

Authors: Artin Spiridonoff, Alex Olshevsky, Yannis Paschalidis

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify our findings and compare different communication strategies in Local SGD, we performed the following numerical experiments, using an Nvidia GTX-1060 GPU and Intel Core i7-7700k processor. We use Local SGD to minimize F(x) using different communication strategies... Figures (a) and (b) show the error over iteration and communication rounds, respectively.
Researcher Affiliation Academia Artin Spiridonoff Division of Systems Engineering Boston University Boston, MA 02215 artin@bu.edu Alex Olshevsky Division of Systems Engineering Boston University Boston, MA 02215 alexols@bu.edu Ioannis Ch. Paschalidis Division of Systems Engineering Boston University Boston, MA 02215 yannisp@bu.edu
Pseudocode Yes Algorithm 1 Local SGD 1: Input: x0 i = x0 for all i [n], total number of iterations T, the step-size sequence {ηt}T 1 t=0 , and I [T] 2: for t = 0, . . . , T 1 do 3: for j = 1, . . . , N do 4: evaluate a stochastic gradient ˆgt j 5: if t + 1 I then 6: xt+1 j = 1 N PN i=1(xt i ηtˆgt i) 7: else 8: xt+1 j = xt j ηtˆgt j 9: end if 10: end for 11: end for
Open Source Code No The paper states in the reproducibility checklist (Question 3a) that code is included, but does not provide a direct link or explicit mention of its location (e.g., supplementary material) in the main text.
Open Datasets Yes (ii) the a9a dataset from LIBSVM (Chang, Lin, 2011) which includes 32561 data points with d = 124 features. ... Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Dataset Splits No The paper mentions using specific datasets but does not provide details on training, validation, or test splits (e.g., percentages, sample counts, or methodology) in the provided text.
Hardware Specification Yes To verify our findings and compare different communication strategies in Local SGD, we performed the following numerical experiments, using an Nvidia GTX-1060 GPU and Intel Core i7-7700k processor.
Software Dependencies No The paper does not explicitly list software dependencies with version numbers.
Experiment Setup Yes We used N = 20 workers, T = 1000 iterations, c1 = 1.0 and c2 = 10 10 with d = 3 and step-size sequence ηt = 3/(µ(t + 1)). ... We use the step-size sequence ηt = min{1/L, 2/(µ(t + 1))} with µ = 1, L = 2, and σ = 8, T = 1000.