Reliable Adversarial Distillation with Unreliable Teachers
Authors: Jianing Zhu, Jiangchao Yao, Bo Han, Jingfeng Zhang, Tongliang Liu, Gang Niu, Jingren Zhou, Jianliang Xu, Hongxia Yang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the benchmark CIFAR-10/CIFAR-100 and the more challenging Tiny-Image Net datasets to evaluate the efficiency of our IAD. In Section 5.1, we compare IAD with benchmark adversarial training methods (AT and TRADES) and some related methods which utilize adversarially pre-trained models via KD (ARD and AKD2) on CIFAR-10/CIFAR100 (Krizhevsky, 2009) datasets. In Section 5.2, we compare the previous methods with IAD on a more challenging dataset Tiny-Image Net (Le & Yang, 2015). In Section 5.3, the ablation studies are conducted to analyze the effects of the hyper-parameter β and different warming-up periods for IAD. |
| Researcher Affiliation | Collaboration | Jianing Zhu1 Jiangchao Yao2 Bo Han1, Jingfeng Zhang3 Tongliang Liu4 Gang Niu3 Jingren Zhou2 Jianliang Xu1 Hongxia Yang2 1Hong Kong Baptist University 2Alibaba Group 3RIKEN Center for Advanced Intelligence Project 4The University of Sydney |
| Pseudocode | Yes | Algorithm 1 Introspective Adversarial Distillation (IAD) |
| Open Source Code | Yes | To ensure the reproducibility of experimental results, our code is available at https://github. com/ZFancy/IAD. |
| Open Datasets | Yes | We conduct extensive experiments on the benchmark CIFAR-10/CIFAR-100 and the more challenging Tiny-Image Net datasets to evaluate the efficiency of our IAD. |
| Dataset Splits | No | The paper describes training on datasets like CIFAR-10/CIFAR-100 and Tiny-Image Net and mentions test accuracy, but it does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers like SGD, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions). |
| Experiment Setup | Yes | Specifically, we train Res Net-18 under different methods using SGD with 0.9 momentum for 200 epochs. The initial learning rate is 0.1 divided by 10 at Epoch 100 and Epoch 150 respectively, and the weight decay=0.0002. In the settings of adversarial training, we set the perturbation bound ϵ = 8/255, the PGD step size σ = 2/255, and PGD step numbers K = 10. In the settings of distillation, we use τ = 1 and use models pre-trained by AT and TRADES which have the best PGD-10 test accuracy as the teacher models for ARD, AKD2 and our IAD. For ARD, we set its hyper-parameter λ = 0 as recommend in Goldblum et al. (2020) for gaining better robustness. For AKD2, we set λ1 = 0.25, λ2 = 0.5 and λ3 = 0.25 as recommanded in Chen et al. (2021). For IAD-I and IAD-II, we respectively set the warming-up period as 60/80 and 40/80 epochs to train on CIFAR-10/CIFAR-100. Regarding the computation of α, we set λ = 0, β = 0.1. For γ, we currently set γ = 1 α. |