reproducibilityindex.ai

Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks

Authors: Jy-yong Sohn, Dong-Jun Han, Beongjun Choi, Jaekyun Moon

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real datasets conﬁrm that the suggested codes provide substantial improvement in Byzantine tolerance of distributed learning systems employing Sign SGD. We implement the suggested coded distributed learning algorithms in Py Torch, and deploy them on Amazon EC2 using Python with MPI4py package. We trained RESNET-18 using CIFAR-10 dataset as well as a logistic regression model using Amazon Employee Access dataset.
Researcher Affiliation	Academia	Jy-yong Sohn jysohn1108@kaist.ac.kr Dong-Jun Han djhan93@kaist.ac.kr Beongjun Choi bbzang10@kaist.ac.kr Jaekyun Moon jmoon@kaist.edu School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)
Pseudocode	Yes	Algorithm 1 Data allocation matrix G satisfying perfect b Byzantine tolerance (0 < b < n/2 )
Open Source Code	No	The paper states: "We implement the suggested coded distributed learning algorithms in Py Torch, and deploy them on Amazon EC2 using Python with MPI4py package." However, it does not explicitly provide a link or state that the code developed for this paper is open-source or publicly available.
Open Datasets	Yes	We trained RESNET-18 using CIFAR-10 dataset as well as a logistic regression model using Amazon Employee Access dataset. We used c4.large instances for n workers that compute batch gradients, and a single c4.2xlarge instance for the master that aggregates the gradients from workers and determines the model updating rule.
Dataset Splits	No	The paper mentions "ntrain = 50000 and ntest = 10000" for CIFAR-10 and "number of training data q = 26325" for Amazon Employee Access, but it does not specify any validation splits or percentages for any dataset.
Hardware Specification	Yes	our experiments are simulated on g4dn.xlarge instances (having a GPU) for both workers and the master. We used c4.large instances for n workers that compute batch gradients, and a single c4.2xlarge instance for the master that aggregates the gradients from workers and determines the model updating rule.
Software Dependencies	No	We implement the suggested coded distributed learning algorithms in Py Torch, and deploy them on Amazon EC2 using Python with MPI4py package. The paper mentions software names but does not provide specific version numbers for PyTorch, Python, or MPI4py.
Experiment Setup	Yes	Similar to the simulation settings in the previous works [5,6], we used the momentum counterpart SIGNUM instead of SIGNSGD for fast convergence, and used a learning rate of γ = 0.0001 and a momentum term of η = 0.9. We used stochastic mini-batch gradient descent with batch size B