Communication-efficient SGD: From Local SGD to One-Shot Averaging
Authors: Artin Spiridonoff, Alex Olshevsky, Yannis Paschalidis
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify our findings and compare different communication strategies in Local SGD, we performed the following numerical experiments, using an Nvidia GTX-1060 GPU and Intel Core i7-7700k processor. We use Local SGD to minimize F(x) using different communication strategies... Figures (a) and (b) show the error over iteration and communication rounds, respectively. |
| Researcher Affiliation | Academia | Artin Spiridonoff Division of Systems Engineering Boston University Boston, MA 02215 artin@bu.edu Alex Olshevsky Division of Systems Engineering Boston University Boston, MA 02215 alexols@bu.edu Ioannis Ch. Paschalidis Division of Systems Engineering Boston University Boston, MA 02215 yannisp@bu.edu |
| Pseudocode | Yes | Algorithm 1 Local SGD 1: Input: x0 i = x0 for all i [n], total number of iterations T, the step-size sequence {ηt}T 1 t=0 , and I [T] 2: for t = 0, . . . , T 1 do 3: for j = 1, . . . , N do 4: evaluate a stochastic gradient ˆgt j 5: if t + 1 I then 6: xt+1 j = 1 N PN i=1(xt i ηtˆgt i) 7: else 8: xt+1 j = xt j ηtˆgt j 9: end if 10: end for 11: end for |
| Open Source Code | No | The paper states in the reproducibility checklist (Question 3a) that code is included, but does not provide a direct link or explicit mention of its location (e.g., supplementary material) in the main text. |
| Open Datasets | Yes | (ii) the a9a dataset from LIBSVM (Chang, Lin, 2011) which includes 32561 data points with d = 124 features. ... Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. |
| Dataset Splits | No | The paper mentions using specific datasets but does not provide details on training, validation, or test splits (e.g., percentages, sample counts, or methodology) in the provided text. |
| Hardware Specification | Yes | To verify our findings and compare different communication strategies in Local SGD, we performed the following numerical experiments, using an Nvidia GTX-1060 GPU and Intel Core i7-7700k processor. |
| Software Dependencies | No | The paper does not explicitly list software dependencies with version numbers. |
| Experiment Setup | Yes | We used N = 20 workers, T = 1000 iterations, c1 = 1.0 and c2 = 10 10 with d = 3 and step-size sequence ηt = 3/(µ(t + 1)). ... We use the step-size sequence ηt = min{1/L, 2/(µ(t + 1))} with µ = 1, L = 2, and σ = 8, T = 1000. |