reproducibilityindex.ai

Exponential Graph is Provably Efficient for Decentralized Deep Training

Authors: Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, PAN PAN, Wotao Yin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive industry-level experiments across different tasks and models with various decentralized methods, graphs, and network size to validate our theoretical results.
Researcher Affiliation	Collaboration	Bicheng Ying1,3 , Kun Yuan2 , Yiming Chen2 , Hanbin Hu4, Pan Pan2, Wotao Yin2 1 University of California, Los Angeles 2 DAMO Academy, Alibaba Group 3 Google Inc. 4 University of California, Santa Barbara ybc@ucla.edu, {kun.yuan, charles.cym}@alibaba-inc.com, hanbinhu@ucsb.edu, {panpan.pp, wotao.yin}@alibaba-inc.com
Pseudocode	Yes	Algorithm 1 Dm SGD
Open Source Code	Yes	Our code is implemented through Blue Fog and available at https://github.com/Bluefog-Lib/Neur IPS2021-Exponential-Graph.
Open Datasets	Yes	We conduct a series of image classiﬁcation experiments with the Image Net-1K [16], which consists of 1,281,167 training images and 50,000 validation images in 1000 classes.
Dataset Splits	Yes	We conduct a series of image classiﬁcation experiments with the Image Net-1K [16], which consists of 1,281,167 training images and 50,000 validation images in 1000 classes.
Hardware Specification	Yes	Each server contains 8 V100 GPUs in our cluster and is treated as one node.
Software Dependencies	Yes	We implement all decentralized algorithms with Py Torch [46] 1.8.0 using NCCL 2.8.3 (CUDA 10.1) as the communication backend. For the implementation of decentralized methods, we utilize Blue Fog [63].
Experiment Setup	Yes	The training protocol in [21] is used. In details, we train total 90 epochs. The learning rate is warmed up in the ﬁrst 5 epochs and is decayed by a factor of 10 at 30, 60 and 80-th epoch. The momentum SGD optimizer is used with linear learning rate scaling by default. Experiments are trained in the mixed precision using Pytorch native amp module.