The Hidden Vulnerability of Distributed Learning in Byzantium
Authors: El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally show its strong to utmost effectivity on CIFAR 10 and MNIST. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non Byzantine gradients had been used to update the model. We implemented the three (α, f) Byzantine resilient gradient aggregation rules presented in Section 2.3, along with the attack introduced in Section 3. We report in this section on the actual impact this attack can have, on the MNIST and CIFAR 10 problems, despite the use of such aggregation rules. |
| Researcher Affiliation | Academia | 1EPFL, Lausanne, Switzerland. |
| Pseudocode | No | The paper describes the Bulyan algorithm in Section 4 using numbered steps in prose, but it does not present it in a formally structured pseudocode block or algorithm box. |
| Open Source Code | Yes | The code used to carry our experiments out (including additional ones asked by reviewers) is available at https://github.com/LPD-EPFL/bulyan. |
| Open Datasets | Yes | We report in this section on the actual impact this attack can have, on the MNIST and CIFAR 10 problems |
| Dataset Splits | No | The paper states that 'The accuracy is always measured on the testing set' but does not explicitly describe train/validation/test dataset splits or mention the use of a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor speeds, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries). |
| Experiment Setup | Yes | L2 regularization of value 10^-4 is used for both models, and both use the Xavier weight initialization algorithm. We use a fading learning rate η (epoch) = η0 rη epoch+rη . The initial learning rate η0, the fading rate rη, and the mini batch size depend on each experiment. On MNIST, we use η0 = 1, rη = 10000, a batch size of 83 images (256 for Brute). |