Asynchronous Byzantine Machine Learning (the case of SGD)

Authors: Georgios Damaskinos, El Mahdi El Mhamdi, Rachid Guerraoui, Rhicheek Patra, Mahsa Taziki

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Kardam on the CIFAR100 and EMNIST datasets and measure its overhead with respect to non Byzantine-resilient solutions. We empirically show that Kardam does not introduce additional noise to the learning procedure but does induce a slowdown (the cost of Byzantine resilience) that we both theoretically and empirically show to be less than f/n, where f is the number of Byzantine failures tolerated and n the total number of workers. Interestingly, we also empirically observe that the dampening component is interesting in its own right for it enables to build an SGD algorithm that outperforms alternative staleness-aware asynchronous competitors in environments with honest workers.
Researcher Affiliation Academia 1EPFL, Lausanne, Switzerland.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code to reproduce our experiments as well as a few additional results (varying f) will be found at https://github.com/LPD-EPFL/kardam.
Open Datasets Yes We evaluate Kardam on the CIFAR100 and EMNIST datasets and measure its overhead with respect to non Byzantine-resilient solutions.
Dataset Splits No The paper mentions using the CIFAR-100 and EMNIST datasets but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification No The paper only mentions running experiments 'in a distributed setting' but does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8 or PyTorch 1.9) needed to replicate the experiment.
Experiment Setup Yes We employ the convolutional neural network (CNN) described in Table 2 for image classification on CIFAR-100. The chosen base learning rate is 15 10 4 and the minibatch size is 100 examples (Neyshabur et al., 2015). If not stated otherwise, we employ a setup with no actual Byzantine behavior and deploy Kardam with f = 3 on 10 workers.