Bag of Tricks for Adversarial Training
Authors: Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, Jun Zhu
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought. |
| Researcher Affiliation | Collaboration | Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, 100084 China |
| Pseudocode | No | The paper provides mathematical formulations but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code is available at https://github.com/P2333/Bag-of-Tricks-for-AT |
| Open Datasets | Yes | Our experiments are done on CIFAR-10 (Krizhevsky & Hinton, 2009) under the ℓ threat model of maximal perturbation ϵ = 8/255, without accessibility to additional data. |
| Dataset Splits | Yes | Later Rice et al. (2020) provide a comprehensive study on the overfitting phenomenon in AT, and advocate early stopping the training epoch as a general strategy for preventing adversarial overfitting, which could be triggered according to the PGD accuracy on a split validation set. |
| Hardware Specification | No | This work was supported by... and the NVIDIA NVAIL Program with GPU/DGX Acceleration. The paper mentions GPU/DGX acceleration but does not specify exact GPU or DGX models used for the experiments. |
| Software Dependencies | No | The models are implemented by https://github.com/kuangliu/pytorch-cifar. The paper does not provide specific version numbers for software dependencies like PyTorch or other libraries used in the experiments. |
| Experiment Setup | Yes | Default setting. Following Rice et al. (2020), in the default setting, we apply the primary PGD-AT framework and the hyperparameters including batch size 128; SGD momentum optimizer with the initial learning rate of 0.1; weight decay 5 10 4; Re LU activation function and no label smoothing; train mode for batch normalization when crafting adversarial examples. All the models are trained for 110 epochs with the learning rate decaying by a factor of 0.1 at 100 and 105 epochs, respectively. |