CSER: Communication-efficient SGD with Error Reset

Authors: Cong Xie, Shuai Zheng, Sanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that when combined with highly aggressive compressors, the CSER algorithms accelerate the distributed training by nearly 10 for CIFAR-100, and by 4.5 for Image Net.
Researcher Affiliation Collaboration 1 Department of Computer Science University of Illinois Urbana-Champaign 2 Amazon Web Services
Pseudocode Yes Algorithm 1 QSparse-local-SGD; Algorithm 2 CSER; Algorithm 3 Partial Synchronization (PSync); Algorithm 4 Distributed Momentum SGD with Error-Reset (M-CSER, implementation I)
Open Source Code No The paper does not provide concrete access to source code (e.g., a specific repository link or an explicit code release statement) for the methodology described.
Open Datasets Yes We conduct experiments on two image classification benchmarks: CIFAR-100 [10], and Image Net dataset [16]
Dataset Splits No The paper mentions using CIFAR-100 and ImageNet for experiments and discusses 'test accuracy', but it does not specify explicit train/validation/test dataset splits (e.g., percentages, sample counts) or mention the use of a distinct validation set for hyperparameter tuning or model selection.
Hardware Specification Yes in a cluster of 8 machines where each machine has 1 NVIDIA V100 GPU and up to 10 Gb/s networking bandwidth.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup Yes For CIFAR-100, we use the wide residual network (Wide-Res Net-40-8, [32]). We set weight decay to 0.0005, momentum to 0.9, and minibatch size to 16 per worker. We decay the learning rates by 0.2 at 60, 120 and 160 epochs, and train for 200 epochs. The initial learning rate is varied in {0.05, 0.1, 0.5, 1.0}.