ATOMO: Communication-efficient Learning via Atomic Sparsification
Authors: Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, Stephen Wright
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present an empirical study of Spectral-ATOMO and compare it to the recently proposed QSGD [14], and Tern Grad [16], on a different neural network models and data sets, under real distributed environments. |
| Researcher Affiliation | Academia | 1Department of Computer Sciences, 2Department of Electrical and Computer Engineering University of Wisconsin-Madison |
| Pseudocode | Yes | Algorithm 1: ATOMO probabilities |
| Open Source Code | Yes | 2code available at: https://github.com/hwang595/ATOMO |
| Open Datasets | Yes | We conducted our experiments on various models, datasets, learning tasks, and neural network models as detailed in Table 2. Dataset CIFAR-10 CIFAR-100 SVHN |
| Dataset Splits | No | The paper mentions training data and mini-batch SGD but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | Yes | Our entire experimental pipeline is implemented in Py Torch [50] with mpi4py [49], and deployed on either g2.2xlarge, m5.2xlarge and m5.4xlarge instances in Amazon AWS EC2. We conducted our experiments on various models, datasets, learning tasks, and neural network models as detailed in Table 2. |
| Software Dependencies | No | Our entire experimental pipeline is implemented in Py Torch [50] with mpi4py [49]. The paper mentions the software used but does not provide specific version numbers for PyTorch or mpi4py. |
| Experiment Setup | Yes | In our experiments, we use data augmentation (random crops, and flips), and tuned the step-size for every different setup as shown in Table 5 in Appendix D. Momentum and regularization terms are switched off to make the hyperparamter search tractable and the results more legible. ... We ran Res Net-34 on CIFAR-10 using mini-batch SGD with batch size 512 split among compute nodes. |