Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Authors: Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li, Dan Alistarh
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the convergence of Safeguard SGD to examine its practical performance against prior works. We perform the non-convex task of training a residual network Res Net-20 (He et al., 2016) on the CIFAR-10/100 datasets (Krizhevsky et al., 2014). More details are given in Appendix C. |
| Researcher Affiliation | Collaboration | Microsoft Research Redmond, zeyuan@csail.mit.edu University of Waterloo, faezeeb75@gmail.com Microsoft Research Redmond, jerrl@microsoft.com IST Austria, dan.alistarh@ist.ac.at |
| Pseudocode | Yes | Algorithm 1 Safeguard SGD: perturbed SGD with double safe guard Input: point x0 Rd, rate η > 0, lengths T T1 T0 1, threshold T1 > T0 > 0; 1: good0 [m]; 2: for t 0 to T 1 do 3: last1 max{t1 [t]: t1 is a multiple of T1}; 4: last0 max{t0 [t]: t0 is a multiple of T0} 5: for each i goodt do 6: receive t,i Rd from machine i; 7: Ai Pt k=last1 k,i |goodk| and Bi Pt k=last0 k,i |goodk|; 8: Amed Ai where i goodt is any machine s.t. {j goodt : Aj Ai T1} > m/2. 9: Bmed Bi where i goodt is any machine s.t. {j goodt : Bj Bi T0} > m/2. 10: goodt+1 i goodt : Ai Amed 2T1 V Bi Bmed 2T0 ; 11: xt+1 = xt η ξt + 1 |goodt| P i goodt t,i ; Gaussian noise ξt N(0, ν2I) on { t ,i}t t,i [m]. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is released or provide a link to a repository. It discusses implementation but not availability. |
| Open Datasets | Yes | We conduct experiments on training a residual network Res Net-20 He et al. (2016) on the CIFAR-10/100 image classification tasks Krizhevsky et al. (2014). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly detail a validation split. It says |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., CPU, GPU models, or cloud instances). |
| Software Dependencies | No | The paper mentions implementing models and algorithms but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In all of our experiments, we use 10 workers and mini-batch size 10 per worker. Given any attacker and any defender algorithm, we run SGD three times for 140 epochs, each time with a different initial learning rate η {0.1, 0.2, 0.4}.7 We let the learning rate decrease by a factor of 10 on epochs 80 and 110, and present present the best testing accuracies in the three runs (each corresponding to a different initial learning rate). |