Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
Authors: Junyuan Hong, Yi Zeng, Shuyang Yu, Lingjuan Lyu, Ruoxi Jia, Jiayu Zhou
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate the effectiveness of our proposed ABD in diminishing transferred backdoor knowledge while maintaining compatible downstream performances as the vanilla KD. ... To evaluate the effectiveness of the ABD, we conduct extensive experiments on 2 benchmark datasets and 10 different attacks to show ABD s efficacy in diminishing the transfer of malicious knowledge. |
| Researcher Affiliation | Collaboration | *Equal contribution 1Michigan State University, Michigan, USA 2Virginia Tech, Virginia, USA 3Sony AI, Japan. Correspondence to: Lingjuan Lyu <lingjuan.lv@sony.com>, Jiayu Zhou <jiayuz@msu.edu>. |
| Pseudocode | Yes | Algorithm 1 One Round of KD with Self-Retrospection |
| Open Source Code | Yes | Codes are released at https://github. com/illidanlab/ABD. |
| Open Datasets | Yes | Datasets and models. We use the same datasets, CIFAR10 (Krizhevsky et al., 2009) and GTSR-B (Stallkamp et al., 2012), as (Zeng et al., 2022a) to evaluate the backdoor defenses. |
| Dataset Splits | No | The results of these data-free KD methods are then compared to the vanilla KD, which uses 10,000 clean, in-distribution CIFAR-10 samples. |
| Hardware Specification | No | Time complexity analysis. SV is utilized to obtain an ensemble of effective shuffled models, and the forward pass of these models is used to suppress backdoor information. Compared to vanilla data-free KD for each epoch that includes SV, we introduce an additional O(n O(θT )) time complexity, where O(θT ) represents the time complexity of using the teacher model, θT , in a single forward pass on a batch of data. |
| Software Dependencies | No | Distillation methods. We use ZSKT (Micaelli & Storkey, 2019), CMI (Fang et al., 2021), and OOD (Asano & Saeed, 2021) as the baseline distillation methods. ... We follow previous work to use their published codes234 and hyperparameters. |
| Experiment Setup | Yes | In this section, we provide details of hyper-parameters. To verify shuffling models, we cache 50 batches for ZSKT and 100 batches on OOD as Ds. Shuffling Vaccine is done by randomly changing the order of channels in the last 5 convolutional layers of Wide Res Net (corresponding to the last stage) and an ensemble of three shuffled models is used. ... The Self-Retrospection treatment is done at the last 3 epochs of CMI/OOD, 800 batches of ZSKT. For ZSKT on GTSRB, we tune the KL temperature until maximizing the student s clean accuracy. The preferred temperature will be 0.5... For OOD, we use the pre-sliced 10, 000 patches provided by the authors and augment the patches by random Cut Mix with 100% probability and β = 0.25 for the Beta-sampling. |