Improved OOD Generalization via Adversarial Training and Pretraing
Authors: Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhiming Ma
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct various experiments on both image classification (IC) and natural language understanding (NLU) tasks to verify our theoretical findings. For IC task, we conduct AT on CIFAR10 (Krizhevsky & Hinton, 2009) and Image Net (Deng et al., 2009), and then evaluate the OOD generalization of these models on corrupted OOD data CIFAR10-C and Image Net-C (Hendrycks & Dietterich, 2018). Empirical results on both IC and NLU tasks verify that AT improves OOD generalization. Table 1: Clean and corruption accuracy (%) of Res Net34 on CIFAR10-C and Image Net-C using standard training and adversarial training under both ℓ2-norm and ℓ -norm. |
| Researcher Affiliation | Collaboration | 1University of Chinese Academy of Sciences, Beijing, China 2Academy of Mathematics and Systems Science, Chinese Academy of China 3Huawei Noah s Ark Lab, Shenzhen, China. |
| Pseudocode | Yes | Algorithm 1 Multi-Step SGD. Input: Number of training steps T, learning rate for model parameters ηwt and adversarial input ηx, two initialization points w1, δ1, constant p {2, } and perturbation size r. Return w T +1. 1: for t = 1, , T do 2: Uniformly sample it from {1, , n}. 3: for k = 1, , K do 4: δk+1 = Proj Bp(0,r) (δk+ηx xf(wt, xit +δk)). 5: end for 6: wt+1 = wt ηwt wf(wt, xit + δK+1). 7: end for |
| Open Source Code | No | The paper does not include an unambiguous statement or link indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | Yes | We use the following benchmark datasets. CIFAR10 (Krizhevsky & Hinton, 2009) has 50000 colorful images as training samples from 10 object classes. Image Net (Deng et al., 2009) contains colorful images with over 1 million training samples from 1,000 categories. SST-2 (Socher et al., 2013) and IMDb (Maas et al., 2011) are sentiment analysis datasets... STS-B consists of texts from different genres and sources... (Cer et al., 2017). MNLI is a textual entailment dataset... (Williams et al., 2018). |
| Dataset Splits | No | The paper describes using CIFAR10-C and Image Net-C as OOD (out-of-distribution) data for evaluation, stating 'Each type of corruption has five levels of severity, and each severity has 10000 validation samples' for CIFAR10-C, which refers to evaluation samples, not a traditional validation split for hyperparameter tuning on the training dataset. It does not provide specific train/validation splits (e.g., 80/10/10 percentages or counts) for the primary training datasets (CIFAR10, ImageNet, etc.). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., specific GPU models, CPU models, or memory specifications). |
| Software Dependencies | No | The paper mentions software components like 'BERT' and 'Adam W' but does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | The number of inner loop steps K is 8 for CIFAR10, and 3 for Image Net. The models are trained by SGD with momentum. The number of training epochs is 200 for CIFAR10, and 100 for Image Net. The learning rate starts from 0.1 and decays by a factor 0.2 at epochs 60, 120, 160 (resp. 30, 60, 90) for CIFAR10 (resp. Image Net). Detailed hyperparameters are in Appendix C. The models are trained by Adam W (Loshchilov & Hutter, 2018) for 10 epochs. Detailed hyperparameters are in Appendix C. |