Local SGD Converges Fast and Communicates Little

Authors: Sebastian U. Stich

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we show some numerical experiments to illustrate the results of Theorem 2.2. ... Experimental. We examine the practical speedup on a logistic regression problem, f(x) = 1 n Pn i=1 log(1 + exp( bia i x)) + λ 2 x 2, where ai Rd and bi { 1, +1} are the data samples. The regularization parameter is set to λ = 1/n. We consider the w8a dataset (Platt, 1999) (d = 300, n = 49749). We initialize all runs with x0 = 0d and measure the number of iterations to reach the target accuracy ϵ. ... We depict the results in Figure 3, again under the assumption ρ = 25.
Researcher Affiliation Academia Sebastian U. Stich EPFL, Switzerland sebastian.stich@epfl.ch
Pseudocode Yes Algorithm 1 LOCAL SGD ... Algorithm 2 ASYNCHRONOUS LOCAL SGD (SCHEMATIC)
Open Source Code No The paper mentions other open-source frameworks used in distributed deep learning but does not provide any explicit statement or link for the open-source code of its own proposed methodology.
Open Datasets Yes We consider the w8a dataset (Platt, 1999) (d = 300, n = 49749).
Dataset Splits No The paper does not specify explicit training, validation, or test splits for the dataset used. It mentions reaching a 'target accuracy' as a stopping criterion, but no detailed split information.
Hardware Specification Yes For completeness, we report that all experiments were run on an an Ubuntu 16.04 machine with a 24 cores processor Intel R Xeon R CPU E5-2680 v3 @ 2.50GHz.
Software Dependencies No The paper mentions the operating system ('Ubuntu 16.04') but does not specify any programming languages, libraries, or frameworks with their version numbers that were used for implementing the experiments.
Experiment Setup Yes We initialize all runs with x0 = 0d and measure the number of iterations to reach the target accuracy ϵ. ... By extensive grid search we determine for each configuration (H, K, B) the best stepsize from the set {min(32, cn t+1), 32c}, where c can take the values c = 2i for i Z. ... The regularization parameter is set to λ = 1/n.