Learning from History for Byzantine Robust Optimization
Authors: Sai Praneeth Karimireddy, Lie He, Martin Jaggi
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically demonstrate the effectiveness of CC and SGDM for Byzantine-robust learning. We refer to the baseline robust aggregation rules as RFA (Pillutla et al., 2019), coordinate-wise median (CM), trimmed mean (TM) (Yin et al., 2018), and Krum (Blanchard et al., 2017). The inner iteration (T) of RFA is fixed to 3 as suggested in (Pillutla et al., 2019). Throughout the section, we consider the distributed training for two image classification tasks, namely MNIST (Le Cun & Cortes, 2010) on 16 nodes and CIFAR-10 (Krizhevsky et al., 2009) on 25 nodes. |
| Researcher Affiliation | Academia | 1EPFL, Switzerland. Correspondence to: Sai Praneeth Karimireddy <sai.karimireddy@epfl.ch>. |
| Pseudocode | Yes | Algorithm 1 AGG Centered Clipping |
| Open Source Code | Yes | Our code is open sourced at this link2. 2https://github.com/epfml/ byzantine-robust-optimizer |
| Open Datasets | Yes | Throughout the section, we consider the distributed training for two image classification tasks, namely MNIST (Le Cun & Cortes, 2010) on 16 nodes and CIFAR-10 (Krizhevsky et al., 2009) on 25 nodes. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets and their respective nodes, but it does not provide specific details on how the dataset was split into training, validation, and test sets, such as percentages or sample counts for each split. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. It only mentions the number of nodes in the distributed training setup (16 or 25 nodes). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | Yes | The batch size per worker is set to 32 and the learning rate is 0.1 before 75th epoch and 0.01 afterwards. |