reproducibilityindex.ai

Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Authors: Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, Pan Pan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section will validate our theoretical results by empirically comparing different decentralized algorithms DSGD [39], D2 [67], DSGT [82], De TAG [44] and MG-DSGD in deep learning.
Researcher Affiliation	Collaboration	1DAMO Academy, Alibaba Group 2University of Pennsylvania 3Peking University 4Meta Carbon
Pseudocode	Yes	Algorithm 1: Decentralized SGD with multiple gossip steps (MG-DSGD) Algorithm 2: xi = Fast Gossip Average({ϕi}n i=1, W, R)
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets	Yes	A series of experiments are carried out with CIFAR-10 [34] and Image Net [16] to compare the aforementioned methods.
Dataset Splits	Yes	For CIFAR-10 dataset, it consists of 50,000 training images and 10,000 validation images in 10 classes. For Image Net dataset, it consists of 1,281,167 training images and 50,000 validation images in 1000 classes.
Hardware Specification	Yes	All the models and training scripts in this section run on servers with 8 NVIDIA V100 GPUs with each GPU treated as one node.
Software Dependencies	Yes	We implement all decentralized algorithms with Py Torch [53] 1.6.0 using NCCL 2.8.3 (CUDA 10.1) as the communication backend
Experiment Setup	Yes	For training protocol, we train total 300 epochs and the learning rate is warmed up in the first 5 epochs and is decayed by a factor of 10 at 150 and 250-th epoch. For learning rate, we tuned a strong baseline in the PSGD setting (5e-3 for single node) and used the same setting in all decentralized methods. The batch size is set to 128 on each node.