On the Effect of Batch Size in Byzantine-Robust Distributed Learning

Authors: Yi-Rui Yang, Chang-Wei Shi, Wu-Jun Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that when under Byzantine attacks, using a relatively large batch size can significantly increase the model accuracy, which is consistent with our theoretical results. Moreover, Byz SGDnm can achieve higher model accuracy than existing BRDL methods when under deliberately crafted attacks. In addition, we empirically show that increasing batch size has the bonus of training acceleration.
Researcher Affiliation Academia Yi-Rui Yang Chang-Wei Shi Wu-Jun Li National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing, China {yangyr, shicw}@smail.nju.edu.cn, liwujun@nju.edu.cn
Pseudocode Yes Algorithm 1 Byzantine-Robust SGD with Normalized Momentum (Byz SGDnm)
Open Source Code Yes The core code for our experiments can be found in the supplementary material.
Open Datasets Yes train a Res Net-20 (He et al., 2016) deep learning model on CIFAR-10 dataset (Krizhevsky et al., 2009).
Dataset Splits No The paper mentions that 'The training instances are randomly and equally distributed to the workers.' and 'C = 160 50000 (1 δ) since we train the model for 160 epochs with 50000 training instances.' but does not explicitly provide percentages or counts for training, validation, and test splits.
Hardware Specification Yes All the experiments presented in this work are conducted on a distributed platform with 9 dockers. Each docker is bound to an NVIDIA TITAN Xp GPU.
Software Dependencies No The paper does not provide specific version numbers for ancillary software dependencies such as Python, PyTorch, or other libraries. It only implies the use of deep learning frameworks.
Experiment Setup Yes Experimental settings. In existing works (Allouah et al., 2023; Karimireddy et al., 2021; 2022) on BRDL, the batch size is typically set to 32 or 50 on the CIFAR-10 dataset. Therefore, We set Byz SGDm (Karimireddy et al., 2021) with batch size 32 as the baseline, and compare the performance of Byz SGDm with different batch size (ranging from 64 to 1024) to the baseline under ALIE attack (Baruch et al., 2019). In our experiments, we use four widely-used robust aggregators Krum (KR) (Blanchard et al., 2017), geometric median (GM) (Chen et al., 2017), coordinate-wise median (CM) (Yin et al., 2018) and centered clipping (CC) (Karimireddy et al., 2021) for Byz SGDm. Moreover, we set the clipping radius to 0.1 for CC. We train the model for 160 epochs with cosine annealing learning rates (Loshchilov & Hutter, 2017). Specifically, the learning rate at the i-th epoch will be ηi = η0 2 (1 + cos( i 160π)) for i = 0, 1, . . . , 159. The initial learning rate η0 is selected from {0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0, 20.0}, and the best final top-1 test accuracy is used as the final metrics. The momentum hyper-parameter β is set to 0.9.