Revisiting Data-Free Knowledge Distillation with Poisoned Teachers

Authors: Junyuan Hong, Yi Zeng, Shuyang Yu, Lingjuan Lyu, Ruoxi Jia, Jiayu Zhou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate the effectiveness of our proposed ABD in diminishing transferred backdoor knowledge while maintaining compatible downstream performances as the vanilla KD. ... To evaluate the effectiveness of the ABD, we conduct extensive experiments on 2 benchmark datasets and 10 different attacks to show ABD s efficacy in diminishing the transfer of malicious knowledge.
Researcher Affiliation Collaboration *Equal contribution 1Michigan State University, Michigan, USA 2Virginia Tech, Virginia, USA 3Sony AI, Japan. Correspondence to: Lingjuan Lyu <lingjuan.lv@sony.com>, Jiayu Zhou <jiayuz@msu.edu>.
Pseudocode Yes Algorithm 1 One Round of KD with Self-Retrospection
Open Source Code Yes Codes are released at https://github. com/illidanlab/ABD.
Open Datasets Yes Datasets and models. We use the same datasets, CIFAR10 (Krizhevsky et al., 2009) and GTSR-B (Stallkamp et al., 2012), as (Zeng et al., 2022a) to evaluate the backdoor defenses.
Dataset Splits No The results of these data-free KD methods are then compared to the vanilla KD, which uses 10,000 clean, in-distribution CIFAR-10 samples.
Hardware Specification No Time complexity analysis. SV is utilized to obtain an ensemble of effective shuffled models, and the forward pass of these models is used to suppress backdoor information. Compared to vanilla data-free KD for each epoch that includes SV, we introduce an additional O(n O(θT )) time complexity, where O(θT ) represents the time complexity of using the teacher model, θT , in a single forward pass on a batch of data.
Software Dependencies No Distillation methods. We use ZSKT (Micaelli & Storkey, 2019), CMI (Fang et al., 2021), and OOD (Asano & Saeed, 2021) as the baseline distillation methods. ... We follow previous work to use their published codes234 and hyperparameters.
Experiment Setup Yes In this section, we provide details of hyper-parameters. To verify shuffling models, we cache 50 batches for ZSKT and 100 batches on OOD as Ds. Shuffling Vaccine is done by randomly changing the order of channels in the last 5 convolutional layers of Wide Res Net (corresponding to the last stage) and an ensemble of three shuffled models is used. ... The Self-Retrospection treatment is done at the last 3 epochs of CMI/OOD, 800 batches of ZSKT. For ZSKT on GTSRB, we tune the KL temperature until maximizing the student s clean accuracy. The preferred temperature will be 0.5... For OOD, we use the pre-sliced 10, 000 patches provided by the authors and augment the patches by random Cut Mix with 100% probability and β = 0.25 for the Beta-sampling.