Self-Progressing Robust Training
Authors: Minhao Cheng, Pin-Yu Chen, Sijia Liu, Shiyu Chang, Cho-Jui Hsieh, Payel Das7107-7115
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared with state-of-the-art adversarial training methods (PGD-L-inf and TRADES) under L-infnorm bounded attacks and various invariance tests, SPROUT consistently attains superior performance and is more scalable to large neural networks. We evaluate the multi-dimensional performance of different training methods on (wide) Res Net and VGG networks using CIFAR-10 and Image Net datasets. |
| Researcher Affiliation | Collaboration | Minhao Cheng,1,2 Pin-Yu Chen,2 Sijia Liu,3 Shiyu Chang,2 Cho-Jui Hsieh,1 Payel Das2 1 Department of Computer Science, UCLA 2 IBM Research 3 Department of Computer Science and Engineering, Michigan State University |
| Pseudocode | Yes | Algorithm 1 SPROUT algorithm Input: Training dataset (X, Y ), Mixup parameter λ, Gaussian augmentation variance 2, model learning rate γθ, Dirichlet label smoothing learning rate γβ and parameter α, generalized cross entropy loss L Initial model θ: random initialization (train from scratch) or pre-trained model checkpoint Initial β: random initialization for epoch=1, . . . , N do for minibatch XB X, YB Y do XB N(XB, 2) Xmix, Ymix Mixup(XB, YB, λ) Ymix Dirichlet(αYmix + (1 α)β) gθ θL(Xmix, Ymix, θ) gβ βL(Xmix, Ymix, θ) θ θ γθgθ β β + γβgβ end for end for return θ |
| Open Source Code | Yes | Our implementation is publicly available. 1Code available at https://github.com/IBM/SPROUT |
| Open Datasets | Yes | We use CIFAR-10 and Image Net (Deng et al. 2009) for performance evaluation. |
| Dataset Splits | No | The paper mentions training and test datasets but does not explicitly provide details about a separate validation set or specific train/validation/test splits by percentage or count needed for reproduction. |
| Hardware Specification | No | The paper mentions "On our machine" and discusses "computation resources" and "run-time" (Table 6), but it does not provide any specific hardware details such as CPU/GPU models, memory, or detailed cloud instance specifications. |
| Software Dependencies | No | The paper mentions using "Pytorch implementation" but does not specify the version of PyTorch or any other software dependencies (e.g., Python, CUDA, other libraries) with their version numbers. |
| Experiment Setup | Yes | As suggested in Mixup (Zhang et al. 2018), we set the Beta distribution parameter a = 0.2 when sampling the mixing parameter λ. For Gaussian augmentation, we set = 0.1, which is within the suggested range in (Zantedeschi, Nicolae, and Rawat 2017). Also, we set the label smoothing parameter α = 0.01. A parameter sensitivity analysis on λ and α is given in Appendix. Unless specified otherwise, for SPROUT we set the model initialization to be a natural model. |