Exploring Memorization in Adversarial Training

Authors: Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, Jun Zhu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various datasets validate the effectiveness of the proposed method.
Researcher Affiliation Collaboration Yinpeng Dong1,2, Ke Xu4, Xiao Yang1, Tianyu Pang1, Zhijie Deng1, Hang Su1,3, Jun Zhu1,2,3 1 Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab 1 BNRist Center, Tsinghua University, Beijing, China; 2 Real AI; 3 Peng Cheng Laboratory; 4 CMU
Pseudocode No The paper describes algorithms like PGD and TRADES using mathematical formulations and textual descriptions (e.g., Section 2.1) but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code is available at https://github.com/dongyp13/memorization-AT.
Open Datasets Yes The experiments are conducted on CIFAR-10 (Krizhevsky & Hinton, 2009) with a Wide Res Net model... We also provide the experimental results on CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011) datasets...
Dataset Splits No The paper mentions 'training' and 'test accuracies' and 'generalization gap (i.e., difference between training and test accuracies)', but does not explicitly provide the specific percentages or sample counts for training, validation, and test dataset splits.
Hardware Specification Yes All of the experiments are conducted on NVIDIA 2080 Ti GPUs.
Software Dependencies No The paper mentions the use of 'SGD optimizer' but does not specify any software versions for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or specific libraries.
Experiment Setup Yes In training, we use the 10-step PGD adversary with α = 2/255. The models are trained via the SGD optimizer with momentum 0.9, weight decay 0.0005, and batch size 128. For CIFAR-10/100, we set the learning rate as 0.1 initially which is decayed by 0.1 at 100 and 150 epochs with totally 200 training epochs. For SVHN, the learning rate starts from 0.01 with a cosine annealing schedule for a total number of 80 training epochs. In our method, We set η = 0.9 and w = 30 along a Gaussian ramp-up curve (Laine & Aila, 2017).