reproducibilityindex.ai

ErrorCompensatedX: error compensation for variance reduced algorithms

Authors: Hanlin Tang, Yao Li, Ji Liu, Ming Yan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we train Res Net-50 (He et al., 2016) on CIFAR10, which consists of 50000 training images and 10000 testing images, each has 10 labels. We run the experiments on eight workers, each having a 1080Ti GPU. The batch size on each worker is 16 and the total batch size is 128. ... Figure 2: Epoch-wise convergence comparison on Res Net-50 for Momenum SGD (left column), STORM (middle column), and IGT (right column) with different communication implementations.
Researcher Affiliation	Collaboration	Hanlin Tang Department of Computer Science University of Rochester tanghl1994@gmail.com Yao Li Department of Mathematics Michigan State University liyao6@msu.edu Ji Liu Kuaishou Technology ji.liu.uwisc@gmail.com Ming Yan Department of Computational Mathematics, Science and Technology; Department of Mathematics Michigan State University myan@msu.edu
Pseudocode	Yes	Algorithm 1 Error Compensated X for general A (x; ξ)
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	In this section, we train Res Net-50 (He et al., 2016) on CIFAR10, which consists of 50000 training images and 10000 testing images, each has 10 labels.
Dataset Splits	No	The paper states '50000 training images and 10000 testing images' for CIFAR-10 but does not specify a validation set split.
Hardware Specification	Yes	We run the experiments on eight workers, each having a 1080Ti GPU.
Software Dependencies	No	The paper does not specify version numbers for any software dependencies used in the experiments.
Experiment Setup	Yes	The batch size on each worker is 16 and the total batch size is 128. ... We use the 1-bit compression in Tang et al. (2019), which leads to an overall 96% of communication volume reduction. ... We grid search the best learning rate from {0.5, 0.1, 0.001} and c0 from {0.1, 0.05, 0.001}, and find that the best learning rate is 0.01 with c0 = 0.05 for both original STORM and IGT. ... We set β = 0.3 for the low-pass filter in all cases.