Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification
Authors: Francisco Utrera, Evan Kravitz, N. Benjamin Erichson, Rajiv Khanna, Michael W. Mahoney
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models, especially if only limited data are available for the new domain task. Further, we observe that adversarial training biases the learnt representations to retaining shapes, as opposed to textures, which impacts the transferability of the source models. Finally, through the lens of influence functions, we discover that transferred adversarially-trained models contain more human-identifiable semantic information, which explains at least partly why adversarially-trained models transfer better. |
| Researcher Affiliation | Academia | Francisco Utrera UC Berkeley utrerf@berkeley.edu Evan Kravitz UC Berkeley kravitz@berkeley.edu N. Benjamin Erichson ICSI and UC Berkeley erichson@berkeley.edu Rajiv Khanna UC Berkeley rajivak@berkeley.edu Michael W. Mahoney ICSI and UC Berkeley mmahoney@stat.berkeley.edu |
| Pseudocode | No | The paper describes mathematical formulations and steps for adversarial training (e.g., equations 1-5) but does not present them within a formally structured pseudocode or algorithm block. |
| Open Source Code | Yes | Also, our code is available at https://github.com/utrerf/robust_transfer_learning.git |
| Open Datasets | Yes | Target datasets. We transfer our models to a broad set of target datasets, including (1) CIFAR-100, (2) CIFAR-10, (3) SVHN, (4) Fashion MNIST (Xiao et al., 2017), (5) KMNIST and (6) MNIST. |
| Dataset Splits | No | The paper mentions 'validation accuracy' in Appendix A.3 but does not provide specific details on the validation dataset split (e.g., percentages, counts, or how it's created distinct from training and test sets). The random subsets described are for training, and while results are reported on a test set, a clear validation split is not defined. |
| Hardware Specification | No | The paper acknowledges support from 'Amazon AWS and Google Cloud' but does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for the experiments. |
| Software Dependencies | No | The paper mentions general software elements like 'python train.py' and implies the use of deep learning frameworks, but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific CUDA versions). |
| Experiment Setup | Yes | Fine-tuning procedure. To transfer our models we copy the entire source model to the target model, freeze the all but the last k convolutional blocks, re-initialize the last fully-connected (FC) layer for the appropriate number of labels, and only fine-tune (re-train) the last FC layer plus 0, 1, 3, or 9 convolutional blocks. [...] All source models are fine-tuned to all datasets using stochastic gradient descent with momentum using the hyperparameters described in Table 3. Table 3: Hyper-parameter summary for all fine-tuned source models: Learning rate 0.1, Batch size 128, Momentum 0.9, Weight decay 5e-4, LR decay 10x, LR decay schedule 1/3, 2/3 epochs. |