Reliable Adversarial Distillation with Unreliable Teachers

Authors: Jianing Zhu, Jiangchao Yao, Bo Han, Jingfeng Zhang, Tongliang Liu, Gang Niu, Jingren Zhou, Jianliang Xu, Hongxia Yang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the benchmark CIFAR-10/CIFAR-100 and the more challenging Tiny-Image Net datasets to evaluate the efficiency of our IAD. In Section 5.1, we compare IAD with benchmark adversarial training methods (AT and TRADES) and some related methods which utilize adversarially pre-trained models via KD (ARD and AKD2) on CIFAR-10/CIFAR100 (Krizhevsky, 2009) datasets. In Section 5.2, we compare the previous methods with IAD on a more challenging dataset Tiny-Image Net (Le & Yang, 2015). In Section 5.3, the ablation studies are conducted to analyze the effects of the hyper-parameter β and different warming-up periods for IAD.
Researcher Affiliation Collaboration Jianing Zhu1 Jiangchao Yao2 Bo Han1, Jingfeng Zhang3 Tongliang Liu4 Gang Niu3 Jingren Zhou2 Jianliang Xu1 Hongxia Yang2 1Hong Kong Baptist University 2Alibaba Group 3RIKEN Center for Advanced Intelligence Project 4The University of Sydney
Pseudocode Yes Algorithm 1 Introspective Adversarial Distillation (IAD)
Open Source Code Yes To ensure the reproducibility of experimental results, our code is available at https://github. com/ZFancy/IAD.
Open Datasets Yes We conduct extensive experiments on the benchmark CIFAR-10/CIFAR-100 and the more challenging Tiny-Image Net datasets to evaluate the efficiency of our IAD.
Dataset Splits No The paper describes training on datasets like CIFAR-10/CIFAR-100 and Tiny-Image Net and mentions test accuracy, but it does not explicitly specify the training/validation/test splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions optimizers like SGD, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions).
Experiment Setup Yes Specifically, we train Res Net-18 under different methods using SGD with 0.9 momentum for 200 epochs. The initial learning rate is 0.1 divided by 10 at Epoch 100 and Epoch 150 respectively, and the weight decay=0.0002. In the settings of adversarial training, we set the perturbation bound ϵ = 8/255, the PGD step size σ = 2/255, and PGD step numbers K = 10. In the settings of distillation, we use τ = 1 and use models pre-trained by AT and TRADES which have the best PGD-10 test accuracy as the teacher models for ARD, AKD2 and our IAD. For ARD, we set its hyper-parameter λ = 0 as recommend in Goldblum et al. (2020) for gaining better robustness. For AKD2, we set λ1 = 0.25, λ2 = 0.5 and λ3 = 0.25 as recommanded in Chen et al. (2021). For IAD-I and IAD-II, we respectively set the warming-up period as 60/80 and 40/80 epochs to train on CIFAR-10/CIFAR-100. Regarding the computation of α, we set λ = 0, β = 0.1. For γ, we currently set γ = 1 α.