Optimal Complexity in Decentralized Training

Authors: Yucheng Lu, Christopher De Sa

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we compare De TAG with other decentralized algorithms on image classification tasks, and we show De TAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.
Researcher Affiliation Academia 1Department of Computer Science, Cornell University, Ithaca, New York, United States.
Pseudocode Yes Algorithm 1 Decentralized Stochastic Gradient Descent with Factorized Consensus Matrices (De Facto), Algorithm 2 Decentralized Stochastic Gradient Tracking with By-Phase Accelerated Gossip (De TAG), Algorithm 3 Accelerated Gossip (AG)
Open Source Code No The paper does not contain any statements or links indicating that its source code is open or publicly available.
Open Datasets Yes We train Le Net on CIFAR10 using 8 workers... to train Resnet20 on CIFAR100
Dataset Splits No The paper mentions training on CIFAR10 and CIFAR100 and shows training loss, but does not specify a validation dataset split or the use of a validation set.
Hardware Specification Yes All the models and training scripts in this section are implemented in Py Torch and run on an Ubuntu 16.04 LTS cluster using a SLURM workload manager running CUDA 9.2, configured with 8 NVIDIA GTX 2080Ti GPUs.
Software Dependencies Yes All the models and training scripts in this section are implemented in Py Torch and run on an Ubuntu 16.04 LTS cluster using a SLURM workload manager running CUDA 9.2
Experiment Setup No Hyperparameters can be found in the supplementary material.