Local SGD Converges Fast and Communicates Little
Authors: Sebastian U. Stich
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we show some numerical experiments to illustrate the results of Theorem 2.2. ... Experimental. We examine the practical speedup on a logistic regression problem, f(x) = 1 n Pn i=1 log(1 + exp( bia i x)) + λ 2 x 2, where ai Rd and bi { 1, +1} are the data samples. The regularization parameter is set to λ = 1/n. We consider the w8a dataset (Platt, 1999) (d = 300, n = 49749). We initialize all runs with x0 = 0d and measure the number of iterations to reach the target accuracy ϵ. ... We depict the results in Figure 3, again under the assumption ρ = 25. |
| Researcher Affiliation | Academia | Sebastian U. Stich EPFL, Switzerland sebastian.stich@epfl.ch |
| Pseudocode | Yes | Algorithm 1 LOCAL SGD ... Algorithm 2 ASYNCHRONOUS LOCAL SGD (SCHEMATIC) |
| Open Source Code | No | The paper mentions other open-source frameworks used in distributed deep learning but does not provide any explicit statement or link for the open-source code of its own proposed methodology. |
| Open Datasets | Yes | We consider the w8a dataset (Platt, 1999) (d = 300, n = 49749). |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test splits for the dataset used. It mentions reaching a 'target accuracy' as a stopping criterion, but no detailed split information. |
| Hardware Specification | Yes | For completeness, we report that all experiments were run on an an Ubuntu 16.04 machine with a 24 cores processor Intel R Xeon R CPU E5-2680 v3 @ 2.50GHz. |
| Software Dependencies | No | The paper mentions the operating system ('Ubuntu 16.04') but does not specify any programming languages, libraries, or frameworks with their version numbers that were used for implementing the experiments. |
| Experiment Setup | Yes | We initialize all runs with x0 = 0d and measure the number of iterations to reach the target accuracy ϵ. ... By extensive grid search we determine for each configuration (H, K, B) the best stepsize from the set {min(32, cn t+1), 32c}, where c can take the values c = 2i for i Z. ... The regularization parameter is set to λ = 1/n. |