reproducibilityindex.ai

Bag of Tricks for Adversarial Training

Authors: Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, Jun Zhu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought.
Researcher Affiliation	Collaboration	Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, 100084 China
Pseudocode	No	The paper provides mathematical formulations but does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Code is available at https://github.com/P2333/Bag-of-Tricks-for-AT
Open Datasets	Yes	Our experiments are done on CIFAR-10 (Krizhevsky & Hinton, 2009) under the ℓ threat model of maximal perturbation ϵ = 8/255, without accessibility to additional data.
Dataset Splits	Yes	Later Rice et al. (2020) provide a comprehensive study on the overﬁtting phenomenon in AT, and advocate early stopping the training epoch as a general strategy for preventing adversarial overﬁtting, which could be triggered according to the PGD accuracy on a split validation set.
Hardware Specification	No	This work was supported by... and the NVIDIA NVAIL Program with GPU/DGX Acceleration. The paper mentions GPU/DGX acceleration but does not specify exact GPU or DGX models used for the experiments.
Software Dependencies	No	The models are implemented by https://github.com/kuangliu/pytorch-cifar. The paper does not provide specific version numbers for software dependencies like PyTorch or other libraries used in the experiments.
Experiment Setup	Yes	Default setting. Following Rice et al. (2020), in the default setting, we apply the primary PGD-AT framework and the hyperparameters including batch size 128; SGD momentum optimizer with the initial learning rate of 0.1; weight decay 5 10 4; Re LU activation function and no label smoothing; train mode for batch normalization when crafting adversarial examples. All the models are trained for 110 epochs with the learning rate decaying by a factor of 0.1 at 100 and 105 epochs, respectively.