reproducibilityindex.ai

Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Authors: Xiangyi Chen, Tiancong Chen, Haoran Sun, Steven Z. Wu, Mingyi Hong

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we show how adding noise helps the practical behavior of the algorithms. Since SIGNSGD is better studied empirically and MEDIANSGD is more of theoretical interest so far, we use SIGNSGD to demonstrate the beneﬁt of injecting noise. We conduct experiments on MNIST and CIFAR-10 datasets.
Researcher Affiliation	Academia	Xiangyi Chen University of Minnesota chen5719@umn.edu Tiancong Chen University of Minnesota chen6271@umn.edu Haoran Sun University of Minnesota sun00111@umn.edu Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Mingyi Hong University of Minnesota mhong@umn.edu
Pseudocode	Yes	Algorithm 1 SIGNSGD (with M nodes), Algorithm 2 MEDIANSGD (with M nodes), Algorithm 3 Noisy SIGNSGD, Algorithm 4 Noisy MEDIANSGD
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We conduct experiments on MNIST and CIFAR-10 datasets.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	We conduct experiments on MNIST and CIFAR-10 datasets. For both datasets, the data distribution on each node is heterogeneous, more speciﬁcally, each node contains some exclusive data for one or two out of ten categories. More details about the experiment conﬁguration can be found in Appendix I. For the noisy algorithms we use b = 0.001. The sudden change of performance is caused by learning rate decay, which happens at 1000/3000/5000 iterations.