Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent

Authors: El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess the effectiveness of our method over 736 different training configurations, comprising the 2 state-of-the-art attacks and 6 defenses. For confidence and reproducibility purposes, each configuration is run 5 times with specified seeds (1 to 5), totalling 3680 runs.
Researcher Affiliation Academia El-Mahdi El-Mhamdi * Ecole Polytechnique, France el-mahdi.el-mhamdi@polytechnique.edu Rachid Guerraoui * Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland rachid.guerraoui@epfl.ch S ebastien Rouault * Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland sebastien.rouault@epfl.ch
Pseudocode No The paper describes the methods and formulations but does not include any explicit pseudocode blocks or figures labeled 'Algorithm' or 'Pseudocode'.
Open Source Code Yes We provide our code along with a script reproducing all of our results, both the experiments and the graphs, in one command. Details, including software and hardware dependencies, are available in Section C. Our contributed code is available at https://github.com/LPD-EPFL/Byzantine Momentum, or as a ZIP archive from Open Review (https://openreview.net/forum?id=H8UHdh WG6A3).
Open Datasets Yes Datasets MNIST, Fashion MNIST CIFAR-10, CIFAR-100 (83 samples/gradient) (50 samples/gradient)... Datasets are pre-processed before training. MNIST receives the same pre-processing as in Baruch et al. (2019): an input image normalization with mean 0.1307 and standard deviation 0.3081. Fashion MNIST, CIFAR-10 and CIFAR-100 are all expanded with horizontally flipped images. For both CIFAR-10 and CIFAR-100, a per-channel normalization with means 0.4914, 0.4822, 0.4465 and standard deviations 0.2023, 0.1994, 0.2010 (Liu, 2019) has been applied.
Dataset Splits No The paper describes using test sets for evaluation ('top-1 cross-accuracy over the whole test set') but does not specify the training, validation, and test dataset splits (e.g., percentages or exact counts) for the datasets used in their experiments. It references existing works for datasets but does not define how splits were made for their specific experimental setup beyond implying a test set.
Hardware Specification Yes Hardware dependencies. We list below the hardware components used: 1 Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz 2 Nvidia Ge Force GTX 1080 Ti 64 GB of RAM
Software Dependencies Yes Software dependencies. Python 3.7.3 has been used, over several GNU/Linux distributions (Debian 10, Ubuntu 18). Besides the standard libraries associated with Python 3.7.3, our scripts also depend on: numpy 1.19.1 torch 1.6.0 torchvision 0.7.0 pandas 1.1.0 matplotlib 3.0.2 PIL 7.2.0 requests 2.21.0 urllib3 1.24.1 chardet 3.0.4 certifi 2018.08.24 idna 2.6 six 1.15.0 pytz 2020.1 dateutil 2.8.1 pyparsing 2.2.0 cycler 0.10.0 kiwisolver 1.0.1 cffi 1.13.2
Experiment Setup Yes Our experiments cover 2 models, 4 datasets, the 6 studied defenses under each of the 2 stateof-the-art attacks5, different fractions of Byzantine workers (either half or a quarter), using Nestorov instead of classical momentum, plus unattacked settings where each worker is honest and the GAR is mere averaging. ... For model training, we use the negative log likelihood loss and respectively 10 4 and 10 2 ℓ2-regularization for the fully connected and convolutional models. We also clip gradients, ensuring their norms remain respectively below 2 and 5 for the fully connected and convolutional models.