The Hidden Vulnerability of Distributed Learning in Byzantium

Authors: El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally show its strong to utmost effectivity on CIFAR 10 and MNIST. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non Byzantine gradients had been used to update the model. We implemented the three (α, f) Byzantine resilient gradient aggregation rules presented in Section 2.3, along with the attack introduced in Section 3. We report in this section on the actual impact this attack can have, on the MNIST and CIFAR 10 problems, despite the use of such aggregation rules.
Researcher Affiliation Academia 1EPFL, Lausanne, Switzerland.
Pseudocode No The paper describes the Bulyan algorithm in Section 4 using numbered steps in prose, but it does not present it in a formally structured pseudocode block or algorithm box.
Open Source Code Yes The code used to carry our experiments out (including additional ones asked by reviewers) is available at https://github.com/LPD-EPFL/bulyan.
Open Datasets Yes We report in this section on the actual impact this attack can have, on the MNIST and CIFAR 10 problems
Dataset Splits No The paper states that 'The accuracy is always measured on the testing set' but does not explicitly describe train/validation/test dataset splits or mention the use of a validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor speeds, or memory amounts used for running the experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries).
Experiment Setup Yes L2 regularization of value 10^-4 is used for both models, and both use the Xavier weight initialization algorithm. We use a fading learning rate η (epoch) = η0 rη epoch+rη . The initial learning rate η0, the fading rate rη, and the mini batch size depend on each experiment. On MNIST, we use η0 = 1, rη = 10000, a batch size of 83 images (256 for Brute).