reproducibilityindex.ai

CSER: Communication-efficient SGD with Error Reset

Authors: Cong Xie, Shuai Zheng, Sanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that when combined with highly aggressive compressors, the CSER algorithms accelerate the distributed training by nearly 10 for CIFAR-100, and by 4.5 for Image Net.
Researcher Affiliation	Collaboration	1 Department of Computer Science University of Illinois Urbana-Champaign 2 Amazon Web Services
Pseudocode	Yes	Algorithm 1 QSparse-local-SGD; Algorithm 2 CSER; Algorithm 3 Partial Synchronization (PSync); Algorithm 4 Distributed Momentum SGD with Error-Reset (M-CSER, implementation I)
Open Source Code	No	The paper does not provide concrete access to source code (e.g., a specific repository link or an explicit code release statement) for the methodology described.
Open Datasets	Yes	We conduct experiments on two image classification benchmarks: CIFAR-100 [10], and Image Net dataset [16]
Dataset Splits	No	The paper mentions using CIFAR-100 and ImageNet for experiments and discusses 'test accuracy', but it does not specify explicit train/validation/test dataset splits (e.g., percentages, sample counts) or mention the use of a distinct validation set for hyperparameter tuning or model selection.
Hardware Specification	Yes	in a cluster of 8 machines where each machine has 1 NVIDIA V100 GPU and up to 10 Gb/s networking bandwidth.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup	Yes	For CIFAR-100, we use the wide residual network (Wide-Res Net-40-8, [32]). We set weight decay to 0.0005, momentum to 0.9, and minibatch size to 16 per worker. We decay the learning rates by 0.2 at 60, 120 and 160 epochs, and train for 200 epochs. The initial learning rate is varied in {0.05, 0.1, 0.5, 1.0}.