The Non-IID Data Quagmire of Decentralized Machine Learning

Authors: Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip Gibbons

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations.
Researcher Affiliation Collaboration 1Microsoft Research 2Carnegie Mellon University 3ETH Zürich.
Pseudocode Yes Gaia (Hsieh et al., 2017)... (Algorithm 1 in Appendix A1). Federated Averaging (Mc Mahan et al., 2017)... (Algorithm 2 in Appendix A). Deep Gradient Compression (Lin et al., 2018)... (Algorithm 3 in Appendix A).
Open Source Code Yes All source code and settings are available at https: //github.com/kevinhsieh/non_iid_dml.
Open Datasets Yes We use two datasets, CIFAR10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015)... To facilitate further study on skewed label partitions, we release a real-world, geo-tagged dataset of common mammals on Flickr (Flickr), which is openly available at https://doi.org/10.5281/ zenodo.3676081 ( 2.2).
Dataset Splits Yes We use the default validation set of each of the two datasets to quantify the validation accuracy as our model quality metric... We control the skewness by controlling the fraction of data that are non-IID. For example, 20% non-IID indicates 20% of the dataset is partitioned by labels, while the remaining 80% is partitioned uniformly at random.
Hardware Specification No The paper mentions running experiments on a "GPU parameter server system" but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using "Caffe" but does not specify a version number or list other software dependencies with version information.
Experiment Setup Yes For all applications, we tune the training parameters (e.g., learning rate, minibatch size, number of epochs, etc.)... Appendix C lists all major training parameters in our study.