ATOMO: Communication-efficient Learning via Atomic Sparsification

Authors: Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, Stephen Wright

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an empirical study of Spectral-ATOMO and compare it to the recently proposed QSGD [14], and Tern Grad [16], on a different neural network models and data sets, under real distributed environments.
Researcher Affiliation Academia 1Department of Computer Sciences, 2Department of Electrical and Computer Engineering University of Wisconsin-Madison
Pseudocode Yes Algorithm 1: ATOMO probabilities
Open Source Code Yes 2code available at: https://github.com/hwang595/ATOMO
Open Datasets Yes We conducted our experiments on various models, datasets, learning tasks, and neural network models as detailed in Table 2. Dataset CIFAR-10 CIFAR-100 SVHN
Dataset Splits No The paper mentions training data and mini-batch SGD but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification Yes Our entire experimental pipeline is implemented in Py Torch [50] with mpi4py [49], and deployed on either g2.2xlarge, m5.2xlarge and m5.4xlarge instances in Amazon AWS EC2. We conducted our experiments on various models, datasets, learning tasks, and neural network models as detailed in Table 2.
Software Dependencies No Our entire experimental pipeline is implemented in Py Torch [50] with mpi4py [49]. The paper mentions the software used but does not provide specific version numbers for PyTorch or mpi4py.
Experiment Setup Yes In our experiments, we use data augmentation (random crops, and flips), and tuned the step-size for every different setup as shown in Table 5 in Appendix D. Momentum and regularization terms are switched off to make the hyperparamter search tractable and the results more legible. ... We ran Res Net-34 on CIFAR-10 using mini-batch SGD with batch size 512 split among compute nodes.