Communication Compression for Decentralized Training
Authors: Hanlin Tang, Shaoduo Gan, Ce Zhang, Tong Zhang, Ji Liu
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we evaluate two decentralized algorithms by comparing with an allreduce implementation of centralized SGD. We run experiments under diverse network conditions and show that, decentralized algorithms with low precision can speed up training without hurting convergence. |
| Researcher Affiliation | Collaboration | Hanlin Tang1, Shaoduo Gan2, Ce Zhang2, Tong Zhang3, and Ji Liu3,1 1Department of Computer Science, University of Rochester 2Department of Computer Science, ETH Zurich 3Tencent AI Lab |
| Pseudocode | Yes | Algorithm 1 DCD-PSGD and Algorithm 2 ECD-PSGD |
| Open Source Code | No | The paper states 'Two proposed algorithms are implemented in Microsoft CNTK', but does not provide concrete access (e.g., a repository link or explicit release statement) to the source code for their specific implementations of DCD-PSGD or ECD-PSGD. |
| Open Datasets | Yes | We train Res Net20 [He et al., 2016] on CIFAR-10 dataset which has 50,000 images for training and 10,000 images for testing. |
| Dataset Splits | No | The paper states '50,000 images for training and 10,000 images for testing' for the CIFAR-10 dataset, but does not explicitly provide details about a validation set split. |
| Hardware Specification | Yes | We run all experiments on 8 Amazon p2.xlarge EC2 instances, each of which has one Nvidia K80 GPU. |
| Software Dependencies | No | The paper states 'implemented in Microsoft CNTK' but does not specify a version number for CNTK or any other key software dependencies. |
| Experiment Setup | Yes | We choose the image classification task as a benchmark to evaluate our theory. We train Res Net20 [He et al., 2016] on CIFAR-10 dataset which has 50,000 images for training and 10,000 images for testing. Two proposed algorithms are implemented in Microsoft CNTK... The batch size for each node is same as the default configuration in CNTK. We also tune learning rate for each variant. |