Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Authors: Xiangyi Chen, Tiancong Chen, Haoran Sun, Steven Z. Wu, Mingyi Hong
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we show how adding noise helps the practical behavior of the algorithms. Since SIGNSGD is better studied empirically and MEDIANSGD is more of theoretical interest so far, we use SIGNSGD to demonstrate the benefit of injecting noise. We conduct experiments on MNIST and CIFAR-10 datasets. |
| Researcher Affiliation | Academia | Xiangyi Chen University of Minnesota chen5719@umn.edu Tiancong Chen University of Minnesota chen6271@umn.edu Haoran Sun University of Minnesota sun00111@umn.edu Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Mingyi Hong University of Minnesota mhong@umn.edu |
| Pseudocode | Yes | Algorithm 1 SIGNSGD (with M nodes), Algorithm 2 MEDIANSGD (with M nodes), Algorithm 3 Noisy SIGNSGD, Algorithm 4 Noisy MEDIANSGD |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We conduct experiments on MNIST and CIFAR-10 datasets. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We conduct experiments on MNIST and CIFAR-10 datasets. For both datasets, the data distribution on each node is heterogeneous, more specifically, each node contains some exclusive data for one or two out of ten categories. More details about the experiment configuration can be found in Appendix I. For the noisy algorithms we use b = 0.001. The sudden change of performance is caused by learning rate decay, which happens at 1000/3000/5000 iterations. |