$D^2$: Decentralized Training over Decentralized Data
Authors: Hanlin Tang, Xiangru Lian, Ming Yan, Ce Zhang, Ji Liu
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluated D2 on image classification tasks, where each worker has access to only the data of a limited set of labels, and find that D2 significantly outperforms D-PSGD. |
| Researcher Affiliation | Collaboration | 1Department of Computation Science, University of Rochester 2Department of Computational Mathematics, Science and Engineering, Michigan State University 3Department of Mathematics, Michigan State University 4Department of Computer Science, ETH Zurich 5Tencent AI Lab. |
| Pseudocode | Yes | Algorithm 1 The D2 algorithm |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository for the described methodology was found. |
| Open Datasets | Yes | We empirically evaluated D2 on image classification tasks...In our experiment, we select the first 16 classes of Image Net...We train a Le Net on the CIFAR10 dataset. |
| Dataset Splits | No | The paper mentions training on CIFAR10 and ImageNet subsets but does not specify exact train/validation/test splits, percentages, or absolute sample counts, nor does it cite standard splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions deep learning frameworks like CNTK, MXNet, and TensorFlow in related work, and models like Inception V4 and LeNet in experiment settings, but does not provide specific version numbers for any software dependencies used in its own experiments. |
| Experiment Setup | Yes | For TRANSFERLEARNING, we use constant learning rates and tune it from {0.01, 0.025, 0.05, 0.075, 0.1}. For LENET, we use constant learning rate 0.05 which is tuned from {0.5, 0.1, 0.05, 0.01} for centralized algorithms and batch size 128 on each worker. |