reproducibilityindex.ai

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

Authors: Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang, Kailash Gopalakrishnan

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed a suite of experiments using the Ada Comp algorithm. In this paper, we evaluate convergence and compression (items 3 and 4) but do not report the impact on runtime (items 1 and 2). Table 2: CNN, MLP, and LSTM results. Figure 2: Model convergence results for different networks, datasets and learner numbers.
Researcher Affiliation	Industry	Chia-Yu Chen, Jungwook Choi, Daniel Brand, Ankur Agrawal, Wei Zhang, Kailash Gopalakrishnan IBM Research AI 1101 Kitchawan Rd. Yorktown Heights, New York 10598 {cchen, choij, danbrand, ankuragr, weiz, kailash}@us.ibm.com
Pseudocode	Yes	The following pseudo code describes two algorithms. Algorithm 1 shows the gradient weight communication scheme we used to test Ada Comp, and algorithm 2 is the Ada Comp algorithm we propose.
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code for their methodology or a link to a code repository.
Open Datasets	Yes	We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, Image Net, BN50, Shakespeare), optimizers (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). Table 1 records the details of the datasets and neural network models we use in this paper.
Dataset Splits	No	The paper discusses training and testing, and reports test errors, but does not specify explicit percentages or sample counts for training, validation, and test splits.
Hardware Specification	Yes	Experiments were done using IBM Soft Layer cloud servers where each server node is equipped with two Intel Xeon E5-2690-V3 processors and two NVIDIA Tesla K80 cards. Each Xeon processor has 12 cores running at 2.66GHz and each Tesla K80 card contains two K40 GPUs each with 12GB of GDDR5 memory.
Software Dependencies	No	The paper mentions "The software platform is an in-house distributed deep learning framework ((Gupta, Zhang, and Wang 2016), (Nair and Gupta 2017)). The exchange of gradients is done in a peer-to-peer fashion using MPI." However, no specific version numbers for these or other software dependencies are provided, which is required for reproducibility.
Experiment Setup	Yes	In all these experiments we used the same hyper-parameters as the baseline (i.e., no compression). The selection of LT is empirical and is a balance between communication time and model accuracy; the same values are used across all models: LT is set to 50 for convolutional layers and to 500 for FC and LSTM layers. Table 2 provides "Mini-Batch size" and "Epochs" for various models.