Byzantine-Resilient Non-Convex Stochastic Gradient Descent

Authors: Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li, Dan Alistarh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the convergence of Safeguard SGD to examine its practical performance against prior works. We perform the non-convex task of training a residual network Res Net-20 (He et al., 2016) on the CIFAR-10/100 datasets (Krizhevsky et al., 2014). More details are given in Appendix C.
Researcher Affiliation Collaboration Microsoft Research Redmond, zeyuan@csail.mit.edu University of Waterloo, faezeeb75@gmail.com Microsoft Research Redmond, jerrl@microsoft.com IST Austria, dan.alistarh@ist.ac.at
Pseudocode Yes Algorithm 1 Safeguard SGD: perturbed SGD with double safe guard Input: point x0 Rd, rate η > 0, lengths T T1 T0 1, threshold T1 > T0 > 0; 1: good0 [m]; 2: for t 0 to T 1 do 3: last1 max{t1 [t]: t1 is a multiple of T1}; 4: last0 max{t0 [t]: t0 is a multiple of T0} 5: for each i goodt do 6: receive t,i Rd from machine i; 7: Ai Pt k=last1 k,i |goodk| and Bi Pt k=last0 k,i |goodk|; 8: Amed Ai where i goodt is any machine s.t. {j goodt : Aj Ai T1} > m/2. 9: Bmed Bi where i goodt is any machine s.t. {j goodt : Bj Bi T0} > m/2. 10: goodt+1 i goodt : Ai Amed 2T1 V Bi Bmed 2T0 ; 11: xt+1 = xt η ξt + 1 |goodt| P i goodt t,i ; Gaussian noise ξt N(0, ν2I) on { t ,i}t t,i [m].
Open Source Code No The paper does not explicitly state that source code for the described methodology is released or provide a link to a repository. It discusses implementation but not availability.
Open Datasets Yes We conduct experiments on training a residual network Res Net-20 He et al. (2016) on the CIFAR-10/100 image classification tasks Krizhevsky et al. (2014).
Dataset Splits No The paper mentions training and testing but does not explicitly detail a validation split. It says
Hardware Specification No The paper does not specify the hardware used for the experiments (e.g., CPU, GPU models, or cloud instances).
Software Dependencies No The paper mentions implementing models and algorithms but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In all of our experiments, we use 10 workers and mini-batch size 10 per worker. Given any attacker and any defender algorithm, we run SGD three times for 140 epochs, each time with a different initial learning rate η {0.1, 0.2, 0.4}.7 We let the learning rate decrease by a factor of 10 on epochs 80 and 110, and present present the best testing accuracies in the three runs (each corresponding to a different initial learning rate).