Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization
Authors: Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, Pan Pan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section will validate our theoretical results by empirically comparing different decentralized algorithms DSGD [39], D2 [67], DSGT [82], De TAG [44] and MG-DSGD in deep learning. |
| Researcher Affiliation | Collaboration | 1DAMO Academy, Alibaba Group 2University of Pennsylvania 3Peking University 4Meta Carbon |
| Pseudocode | Yes | Algorithm 1: Decentralized SGD with multiple gossip steps (MG-DSGD) Algorithm 2: xi = Fast Gossip Average({ϕi}n i=1, W, R) |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | A series of experiments are carried out with CIFAR-10 [34] and Image Net [16] to compare the aforementioned methods. |
| Dataset Splits | Yes | For CIFAR-10 dataset, it consists of 50,000 training images and 10,000 validation images in 10 classes. For Image Net dataset, it consists of 1,281,167 training images and 50,000 validation images in 1000 classes. |
| Hardware Specification | Yes | All the models and training scripts in this section run on servers with 8 NVIDIA V100 GPUs with each GPU treated as one node. |
| Software Dependencies | Yes | We implement all decentralized algorithms with Py Torch [53] 1.6.0 using NCCL 2.8.3 (CUDA 10.1) as the communication backend |
| Experiment Setup | Yes | For training protocol, we train total 300 epochs and the learning rate is warmed up in the first 5 epochs and is decayed by a factor of 10 at 150 and 250-th epoch. For learning rate, we tuned a strong baseline in the PSGD setting (5e-3 for single node) and used the same setting in all decentralized methods. The batch size is set to 128 on each node. |