Exponential Graph is Provably Efficient for Decentralized Deep Training

Authors: Bicheng Ying, Kun Yuan, Yiming Chen, Hanbin Hu, PAN PAN, Wotao Yin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive industry-level experiments across different tasks and models with various decentralized methods, graphs, and network size to validate our theoretical results.
Researcher Affiliation Collaboration Bicheng Ying1,3 , Kun Yuan2 , Yiming Chen2 , Hanbin Hu4, Pan Pan2, Wotao Yin2 1 University of California, Los Angeles 2 DAMO Academy, Alibaba Group 3 Google Inc. 4 University of California, Santa Barbara ybc@ucla.edu, {kun.yuan, charles.cym}@alibaba-inc.com, hanbinhu@ucsb.edu, {panpan.pp, wotao.yin}@alibaba-inc.com
Pseudocode Yes Algorithm 1 Dm SGD
Open Source Code Yes Our code is implemented through Blue Fog and available at https://github.com/Bluefog-Lib/Neur IPS2021-Exponential-Graph.
Open Datasets Yes We conduct a series of image classification experiments with the Image Net-1K [16], which consists of 1,281,167 training images and 50,000 validation images in 1000 classes.
Dataset Splits Yes We conduct a series of image classification experiments with the Image Net-1K [16], which consists of 1,281,167 training images and 50,000 validation images in 1000 classes.
Hardware Specification Yes Each server contains 8 V100 GPUs in our cluster and is treated as one node.
Software Dependencies Yes We implement all decentralized algorithms with Py Torch [46] 1.8.0 using NCCL 2.8.3 (CUDA 10.1) as the communication backend. For the implementation of decentralized methods, we utilize Blue Fog [63].
Experiment Setup Yes The training protocol in [21] is used. In details, we train total 90 epochs. The learning rate is warmed up in the first 5 epochs and is decayed by a factor of 10 at 30, 60 and 80-th epoch. The momentum SGD optimizer is used with linear learning rate scaling by default. Experiments are trained in the mixed precision using Pytorch native amp module.