On the Effect of Batch Size in Byzantine-Robust Distributed Learning
Authors: Yi-Rui Yang, Chang-Wei Shi, Wu-Jun Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that when under Byzantine attacks, using a relatively large batch size can significantly increase the model accuracy, which is consistent with our theoretical results. Moreover, Byz SGDnm can achieve higher model accuracy than existing BRDL methods when under deliberately crafted attacks. In addition, we empirically show that increasing batch size has the bonus of training acceleration. |
| Researcher Affiliation | Academia | Yi-Rui Yang Chang-Wei Shi Wu-Jun Li National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing, China {yangyr, shicw}@smail.nju.edu.cn, liwujun@nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Byzantine-Robust SGD with Normalized Momentum (Byz SGDnm) |
| Open Source Code | Yes | The core code for our experiments can be found in the supplementary material. |
| Open Datasets | Yes | train a Res Net-20 (He et al., 2016) deep learning model on CIFAR-10 dataset (Krizhevsky et al., 2009). |
| Dataset Splits | No | The paper mentions that 'The training instances are randomly and equally distributed to the workers.' and 'C = 160 50000 (1 δ) since we train the model for 160 epochs with 50000 training instances.' but does not explicitly provide percentages or counts for training, validation, and test splits. |
| Hardware Specification | Yes | All the experiments presented in this work are conducted on a distributed platform with 9 dockers. Each docker is bound to an NVIDIA TITAN Xp GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for ancillary software dependencies such as Python, PyTorch, or other libraries. It only implies the use of deep learning frameworks. |
| Experiment Setup | Yes | Experimental settings. In existing works (Allouah et al., 2023; Karimireddy et al., 2021; 2022) on BRDL, the batch size is typically set to 32 or 50 on the CIFAR-10 dataset. Therefore, We set Byz SGDm (Karimireddy et al., 2021) with batch size 32 as the baseline, and compare the performance of Byz SGDm with different batch size (ranging from 64 to 1024) to the baseline under ALIE attack (Baruch et al., 2019). In our experiments, we use four widely-used robust aggregators Krum (KR) (Blanchard et al., 2017), geometric median (GM) (Chen et al., 2017), coordinate-wise median (CM) (Yin et al., 2018) and centered clipping (CC) (Karimireddy et al., 2021) for Byz SGDm. Moreover, we set the clipping radius to 0.1 for CC. We train the model for 160 epochs with cosine annealing learning rates (Loshchilov & Hutter, 2017). Specifically, the learning rate at the i-th epoch will be ηi = η0 2 (1 + cos( i 160π)) for i = 0, 1, . . . , 159. The initial learning rate η0 is selected from {0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0, 20.0}, and the best final top-1 test accuracy is used as the final metrics. The momentum hyper-parameter β is set to 0.9. |