CARTL: Cooperative Adversarially-Robust Transfer Learning
Authors: Dian Chen, Hongxin Hu, Qian Wang, Li Yinli, Cong Wang, Chao Shen, Qi Li
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that CARTL improves the inherited robustness by about 28% at most compared with the baseline with the same degree of accuracy. Furthermore, we study the relationship between the batch normalization (BN) layers and the robustness in the context of transfer learning, and we reveal that freezing BN layers can further boost the robustness transfer. We conduct extensive experiments on several transfer learning scenarios and observe that the target model freezing affine parameters of BN layers obtains higher robustness with negligible loss of accuracy. |
| Researcher Affiliation | Academia | 1School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, Hubei, China 2Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, USA 3Department of Computer Science, City University of Hong Kong, HK SAR, China 4School of Cyber Science and Engineering, Xi an Jiaotong University, Xi an 710049, Shanxi, China 5Institute for Network Sciences and Cyberspace & BNRist, Tsinghua University, Beijing 100084, China. Correspondence to: Qian Wang <qianwang@whu.edu.cn>. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | For more details about experiment settings, please refer to Appendix C, and our codes are available on Git Hub2. 2https://github.com/NISP-official/CARTL |
| Open Datasets | Yes | To see the effect of fine-tuning on the robustness and accuracy, we adversarially train a Wide-Res Net (WRN) 34-10 (Zagoruyko & Komodakis, 2017) on CIFAR-100 and a WRN 28-4 on CIFAR-10 as source models, then transfer them to CIFAR-10 and SVHN, respectively. |
| Dataset Splits | No | No explicit details on specific training/validation/test dataset splits (e.g., percentages, counts, or specific methods like k-fold cross-validation) were found in the provided text. The paper mentions training data and refers to Appendix C for more details. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific solvers with their versions) were provided in the paper. |
| Experiment Setup | Yes | The source models are trained with PGD-7, and the perturbation is constrained in an ℓ ball with a radius of ϵ = 8/255. During transferring, we break the source models into blocks and fine-tune them in the unit of blocks (e.g., two layers at once for a WRN block). Then we report the adversarial robustness of the target models against the PGD-100 attack. Here, λ is the hyper-parameter controlling the strength of the FDM penalty term. We also emphasize that different from the naive spectrum normalization, we add a hyper-parameter β (0, 1] for further scaling the Lipschitz constant of the fine-tuned part. |