Self-Progressing Robust Training

Authors: Minhao Cheng, Pin-Yu Chen, Sijia Liu, Shiyu Chang, Cho-Jui Hsieh, Payel Das7107-7115

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared with state-of-the-art adversarial training methods (PGD-L-inf and TRADES) under L-infnorm bounded attacks and various invariance tests, SPROUT consistently attains superior performance and is more scalable to large neural networks. We evaluate the multi-dimensional performance of different training methods on (wide) Res Net and VGG networks using CIFAR-10 and Image Net datasets.
Researcher Affiliation Collaboration Minhao Cheng,1,2 Pin-Yu Chen,2 Sijia Liu,3 Shiyu Chang,2 Cho-Jui Hsieh,1 Payel Das2 1 Department of Computer Science, UCLA 2 IBM Research 3 Department of Computer Science and Engineering, Michigan State University
Pseudocode Yes Algorithm 1 SPROUT algorithm Input: Training dataset (X, Y ), Mixup parameter λ, Gaussian augmentation variance 2, model learning rate γθ, Dirichlet label smoothing learning rate γβ and parameter α, generalized cross entropy loss L Initial model θ: random initialization (train from scratch) or pre-trained model checkpoint Initial β: random initialization for epoch=1, . . . , N do for minibatch XB X, YB Y do XB N(XB, 2) Xmix, Ymix Mixup(XB, YB, λ) Ymix Dirichlet(αYmix + (1 α)β) gθ θL(Xmix, Ymix, θ) gβ βL(Xmix, Ymix, θ) θ θ γθgθ β β + γβgβ end for end for return θ
Open Source Code Yes Our implementation is publicly available. 1Code available at https://github.com/IBM/SPROUT
Open Datasets Yes We use CIFAR-10 and Image Net (Deng et al. 2009) for performance evaluation.
Dataset Splits No The paper mentions training and test datasets but does not explicitly provide details about a separate validation set or specific train/validation/test splits by percentage or count needed for reproduction.
Hardware Specification No The paper mentions "On our machine" and discusses "computation resources" and "run-time" (Table 6), but it does not provide any specific hardware details such as CPU/GPU models, memory, or detailed cloud instance specifications.
Software Dependencies No The paper mentions using "Pytorch implementation" but does not specify the version of PyTorch or any other software dependencies (e.g., Python, CUDA, other libraries) with their version numbers.
Experiment Setup Yes As suggested in Mixup (Zhang et al. 2018), we set the Beta distribution parameter a = 0.2 when sampling the mixing parameter λ. For Gaussian augmentation, we set = 0.1, which is within the suggested range in (Zantedeschi, Nicolae, and Rawat 2017). Also, we set the label smoothing parameter α = 0.01. A parameter sensitivity analysis on λ and α is given in Appendix. Unless specified otherwise, for SPROUT we set the model initialization to be a natural model.