Exploring Memorization in Adversarial Training
Authors: Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, Jun Zhu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various datasets validate the effectiveness of the proposed method. |
| Researcher Affiliation | Collaboration | Yinpeng Dong1,2, Ke Xu4, Xiao Yang1, Tianyu Pang1, Zhijie Deng1, Hang Su1,3, Jun Zhu1,2,3 1 Dept. of Comp. Sci. and Tech., Institute for AI, Tsinghua-Bosch Joint ML Center, THBI Lab 1 BNRist Center, Tsinghua University, Beijing, China; 2 Real AI; 3 Peng Cheng Laboratory; 4 CMU |
| Pseudocode | No | The paper describes algorithms like PGD and TRADES using mathematical formulations and textual descriptions (e.g., Section 2.1) but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code is available at https://github.com/dongyp13/memorization-AT. |
| Open Datasets | Yes | The experiments are conducted on CIFAR-10 (Krizhevsky & Hinton, 2009) with a Wide Res Net model... We also provide the experimental results on CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and SVHN (Netzer et al., 2011) datasets... |
| Dataset Splits | No | The paper mentions 'training' and 'test accuracies' and 'generalization gap (i.e., difference between training and test accuracies)', but does not explicitly provide the specific percentages or sample counts for training, validation, and test dataset splits. |
| Hardware Specification | Yes | All of the experiments are conducted on NVIDIA 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions the use of 'SGD optimizer' but does not specify any software versions for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or specific libraries. |
| Experiment Setup | Yes | In training, we use the 10-step PGD adversary with α = 2/255. The models are trained via the SGD optimizer with momentum 0.9, weight decay 0.0005, and batch size 128. For CIFAR-10/100, we set the learning rate as 0.1 initially which is decayed by 0.1 at 100 and 150 epochs with totally 200 training epochs. For SVHN, the learning rate starts from 0.01 with a cosine annealing schedule for a total number of 80 training epochs. In our method, We set η = 0.9 and w = 30 along a Gaussian ramp-up curve (Laine & Aila, 2017). |