Communication-efficient Distributed Learning for Large Batch Optimization
Authors: Rui Liu, Barzan Mozafari
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to evaluate the effectiveness and efficiency of our method. Due to space limit, we focus on JOINTSPAR-LARS because optimizers with layerwise adaptive learning rates are more effective in the large batch setting (You et al., 2017; 2019). More experiments on JOINTSPAR and other SGD-based compression methods can be found in the appendix. |
| Researcher Affiliation | Academia | 1Computer Science and Engineering, University of Michigan, Ann Arbor. Correspondence to: Rui Liu <ruixliu@umich.edu>, Barzan Mozafari <mozafari@umich.edu>. |
| Pseudocode | Yes | Algorithm 1 Distribution update for pm t; Algorithm 2 Our distributed learning method JOINTSPAR (for each worker m) |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the described methodology is open-source or publicly available. |
| Open Datasets | Yes | We use several benchmark datasets in our experiments: MNIST, Fashion-MNIST, SVHN, CIFAR10, CIFAR100 and Image Net. |
| Dataset Splits | No | The paper mentions using well-known datasets and training for a certain number of epochs, but it does not explicitly provide the specific training, validation, or test dataset splits (e.g., percentages, counts, or references to predefined splits). |
| Hardware Specification | Yes | All experiments are run on a computer cluster with up to 16 nodes. Each node has 20 physical CPU cores with clock speed up to 4 GHz, and 4 NVIDIA P100 GPUs. Nodes are connected via a 100Gb/s Infini Band fabric. |
| Software Dependencies | No | We use Py Torch (Paszke et al., 2019) to implement models and learning methods, and use mpi4py (Dalcin et al., 2011) as the communication framework in the distributed setting. The paper mentions software and cites them, but does not specify their version numbers. |
| Experiment Setup | Yes | We set the local batch size to 1024 for each machine, and uses the same tricks (i.e., linear scaling and warmup) as suggested in (Goyal et al., 2017). Other experimental settings are kept the same as in the previous subsection. We train the model on each dataset for 90 epochs with the first 5 epochs as the warmup stage as suggested in (Goyal et al., 2017). For the learning rate schedule, we set the initial learning rate as 0.1, and shrink the learning rate by 0.1 at epoch 30, 50, 70. |