Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective
Authors: Ming-Yu Chung, Sheng-Yen Chou, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo, Tsung-Yi Ho
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Following a comprehensive set of analyses and experiments, we show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation. |
| Researcher Affiliation | Collaboration | Ming-Yu Chung National Taiwan University Sheng-Yen Chou The Chinese University of Hong Kong Chia-Mu Yu National Yang Ming Chiao Tung University Pin-Yu Chen IBM Research Sy-Yen Kuo National Taiwan University Tsung-Yi Ho The Chinese University of Hong Kong |
| Pseudocode | Yes | In the simplest form of KIP-based backdoor attacks (as shown in Algorithm 1 of the Appendix), we first construct the poisoned dataset D = DA DB from DNA A and DNB B . Then, we perform KIP on D and compress the information in D into the distilled poisoned dataset S = {(xs, ys)}NS s=1, where NS NA + NB. Namely, we solve the following optimization problem. The computation procedures of relax-trigger can be found in Algorithm 2 of Appendix A.7. |
| Open Source Code | Yes | Code is available at https://github.com/Mick048/KIP-based-backdoor-attack.git. |
| Open Datasets | Yes | CIFAR-10 is a 10-class dataset with 6000 32 32 color images per class. CIFAR-10 is split into 50000 training images and 10000 testing images. GTSRB contains 43 classes of traffic signs with 39270 images, which are split into 26640 training images and 12630 testing images. |
| Dataset Splits | No | The paper specifies training and testing splits for CIFAR-10 and GTSRB, but does not explicitly mention a validation split or set. For CIFAR-10, it states '50000 training images and 10000 testing images'. For GTSRB, '26640 training images and 12630 testing images'. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and algorithms like KIP, neural tangent kernel (NTK), Adam, ResNet, and backdoor-toolbox, but does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We also set the optimizer to Adam (P. Kingma & Ba, 2015), the learning rate to 0.01, and the batch size to 10 number of class for each dataset. We run KIP with 1000 training steps to generate a distilled dataset. |