reproducibilityindex.ai

On Distributed Adaptive Optimization with Gradient Compression

Authors: Xiaoyun Li, Belhal Karimi, Ping Li

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments are conducted to justify the theoretical ﬁndings, and demonstrate that the proposed method can achieve same test accuracy as the full-gradient AMSGrad with substantial communication savings. With its simplicity and efﬁciency, COMP-AMS can serve as a useful distributed training framework for adaptive gradient methods.
Researcher Affiliation	Industry	Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, WA 98004, USA {xiaoyunli,belhalkarimi,liping11}@baidu.com
Pseudocode	Yes	Algorithm 1 AMSGRAD (Reddi et al., 2018), Algorithm 2 Distributed COMP-AMS with error feedback (EF)
Open Source Code	No	The paper states: 'Our method has been implemented in the PaddlePaddle platform (www.paddlepaddle.org.cn).' This indicates implementation on a public platform but does not explicitly state that the specific code developed for this paper is open-source or provide a link to it.
Open Datasets	Yes	The MNIST (Le Cun et al., 1998) contains 60000 training samples of 28x18 gray-scale hand-written digits from 10 classes, and 10000 test samples. The CIFAR-10 dataset (Krizhevsky & Hinton, 2009) consists of 50000 32x32 RGB natural images from 10 classes for training and 10000 images for testing, which is trained by Le Net-5 (Le Cun et al., 1998). The IMDB movie review (Maas et al., 2011) is a popular binary classiﬁcation dataset for sentiment analysis.
Dataset Splits	No	The paper mentions sizes for training and test sets but does not explicitly provide details about a validation split (e.g., specific percentages or sample counts for validation data).
Hardware Specification	Yes	Our experiments are performed on a GPU cluster with NVIDIA Tesla V100 cards.
Software Dependencies	No	The paper mentions 'PaddlePaddle platform' but does not specify a version number for it or any other software dependencies.
Experiment Setup	Yes	For MNIST and CIFAR-10, the local batch size on each worker is set to be 32. For IMDB, the local batch size is 16. The hyper-parameters in COMP-AMS are set as default β1 = 0.9, β2 = 0.999 and ϵ = 10-8, which are also used for QAdam and 1Bit Adam. For 1Bit Adam, the epochs for warm-up training is set to be 1/20 of the total epochs. For all methods, we tune the initial learning rate over a ﬁne grid (see Appendix A).