Byzantine Stochastic Gradient Descent

Authors: Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of m machines which allegedly compute stochastic gradients every iteration, an -fraction are Byzantine, and may behave adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds "-approximate minimizers of convex functions in T = e O(" 1"2m + 2) iterations. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sample complexity and time complexity.
Researcher Affiliation Collaboration Dan Alistarh IST Austria dan.alistarh@ist.ac.at Zeyuan Allen-Zhu Microsoft Research AI zeyuan@csail.mit.edu Jerry Li Simons Institute jerryzli@berkeley.edu
Pseudocode Yes Algorithm 1 Byzantine SGD( , x1, D, T, TA, TB)
Open Source Code No The paper does not provide concrete access to source code or explicitly state its release.
Open Datasets No The paper is theoretical and does not use or mention any specific datasets.
Dataset Splits No The paper is theoretical and does not mention any dataset splits for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details used for running experiments, as it is a theoretical work.
Software Dependencies No The paper does not provide any specific software dependencies with version numbers.
Experiment Setup No The paper does not describe specific experimental setup details such as hyperparameters or training configurations, as it is a theoretical work.