Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks
Authors: Timothy Castiglia, Anirban Das, Stacy Patterson
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the effectiveness of our algorithm in a multi-level network with slow workers via simulation-based experiments. and 6 EXPERIMENTS In this section, we show the performance of MLL-SGD compared to algorithms that do not account for hierarchy and heterogeneous worker rates. |
| Researcher Affiliation | Academia | T. Castiglia, A. Das, and S. Patterson are with the Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, castit@rpi.edu, dasa2@rpi.edu, sep@cs.rpi.edu. |
| Pseudocode | Yes | Algorithm 1 Multi-Level Local SGD |
| Open Source Code | Yes | A CODE REPOSITORY The code used in our experiments can be found at: https://github.com/rpi-nsl/MLL-SGD. This code simulates a multi-level network with heterogeneous workers, and trains a model using MLL-SGD. |
| Open Datasets | Yes | We use the EMNIST (Cohen et al., 2017) and CIFAR-10 (Krizhevsky et al., 2009) datasets. and We rerun our first experiment from Figure 1 with logistic regression trained on MNIST dataset (Bottou et al., 1994). |
| Dataset Splits | No | The paper mentions 'training loss and test accuracy' and discusses parameters like 'step size' but does not explicitly describe the use of a validation set or provide details on training/validation/test splits. |
| Hardware Specification | No | The paper states, 'We conduct experiments using Pytorch 1.4.0 and Python 3.' but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for these experiments. |
| Software Dependencies | Yes | We conduct experiments using Pytorch 1.4.0 and Python 3. |
| Experiment Setup | Yes | We train the CNN with a step size of 0.01. For Res Net, we use a standard approach of changing the step size from 0.1 to 0.01 to 0.001 over the course of training (He et al., 2016). We let qτ = 32 for all HL-SGD and MLL-SGD variations to be comparable with Local SGD. |