Optimal Complexity in Decentralized Training
Authors: Yucheng Lu, Christopher De Sa
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we compare De TAG with other decentralized algorithms on image classification tasks, and we show De TAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, Ithaca, New York, United States. |
| Pseudocode | Yes | Algorithm 1 Decentralized Stochastic Gradient Descent with Factorized Consensus Matrices (De Facto), Algorithm 2 Decentralized Stochastic Gradient Tracking with By-Phase Accelerated Gossip (De TAG), Algorithm 3 Accelerated Gossip (AG) |
| Open Source Code | No | The paper does not contain any statements or links indicating that its source code is open or publicly available. |
| Open Datasets | Yes | We train Le Net on CIFAR10 using 8 workers... to train Resnet20 on CIFAR100 |
| Dataset Splits | No | The paper mentions training on CIFAR10 and CIFAR100 and shows training loss, but does not specify a validation dataset split or the use of a validation set. |
| Hardware Specification | Yes | All the models and training scripts in this section are implemented in Py Torch and run on an Ubuntu 16.04 LTS cluster using a SLURM workload manager running CUDA 9.2, configured with 8 NVIDIA GTX 2080Ti GPUs. |
| Software Dependencies | Yes | All the models and training scripts in this section are implemented in Py Torch and run on an Ubuntu 16.04 LTS cluster using a SLURM workload manager running CUDA 9.2 |
| Experiment Setup | No | Hyperparameters can be found in the supplementary material. |