reproducibilityindex.ai

Communication-efficient Distributed Learning for Large Batch Optimization

Authors: Rui Liu, Barzan Mozafari

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to evaluate the effectiveness and efficiency of our method. Due to space limit, we focus on JOINTSPAR-LARS because optimizers with layerwise adaptive learning rates are more effective in the large batch setting (You et al., 2017; 2019). More experiments on JOINTSPAR and other SGD-based compression methods can be found in the appendix.
Researcher Affiliation	Academia	1Computer Science and Engineering, University of Michigan, Ann Arbor. Correspondence to: Rui Liu <ruixliu@umich.edu>, Barzan Mozafari <mozafari@umich.edu>.
Pseudocode	Yes	Algorithm 1 Distribution update for pm t; Algorithm 2 Our distributed learning method JOINTSPAR (for each worker m)
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets	Yes	We use several benchmark datasets in our experiments: MNIST, Fashion-MNIST, SVHN, CIFAR10, CIFAR100 and Image Net.
Dataset Splits	No	The paper mentions using well-known datasets and training for a certain number of epochs, but it does not explicitly provide the specific training, validation, or test dataset splits (e.g., percentages, counts, or references to predefined splits).
Hardware Specification	Yes	All experiments are run on a computer cluster with up to 16 nodes. Each node has 20 physical CPU cores with clock speed up to 4 GHz, and 4 NVIDIA P100 GPUs. Nodes are connected via a 100Gb/s Infini Band fabric.
Software Dependencies	No	We use Py Torch (Paszke et al., 2019) to implement models and learning methods, and use mpi4py (Dalcin et al., 2011) as the communication framework in the distributed setting. The paper mentions software and cites them, but does not specify their version numbers.
Experiment Setup	Yes	We set the local batch size to 1024 for each machine, and uses the same tricks (i.e., linear scaling and warmup) as suggested in (Goyal et al., 2017). Other experimental settings are kept the same as in the previous subsection. We train the model on each dataset for 90 epochs with the first 5 epochs as the warmup stage as suggested in (Goyal et al., 2017). For the learning rate schedule, we set the initial learning rate as 0.1, and shrink the learning rate by 0.1 at epoch 30, 50, 70.