Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

Authors: Ming-Yu Chung, Sheng-Yen Chou, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo, Tsung-Yi Ho

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Following a comprehensive set of analyses and experiments, we show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
Researcher Affiliation Collaboration Ming-Yu Chung National Taiwan University Sheng-Yen Chou The Chinese University of Hong Kong Chia-Mu Yu National Yang Ming Chiao Tung University Pin-Yu Chen IBM Research Sy-Yen Kuo National Taiwan University Tsung-Yi Ho The Chinese University of Hong Kong
Pseudocode Yes In the simplest form of KIP-based backdoor attacks (as shown in Algorithm 1 of the Appendix), we first construct the poisoned dataset D = DA DB from DNA A and DNB B . Then, we perform KIP on D and compress the information in D into the distilled poisoned dataset S = {(xs, ys)}NS s=1, where NS NA + NB. Namely, we solve the following optimization problem. The computation procedures of relax-trigger can be found in Algorithm 2 of Appendix A.7.
Open Source Code Yes Code is available at https://github.com/Mick048/KIP-based-backdoor-attack.git.
Open Datasets Yes CIFAR-10 is a 10-class dataset with 6000 32 32 color images per class. CIFAR-10 is split into 50000 training images and 10000 testing images. GTSRB contains 43 classes of traffic signs with 39270 images, which are split into 26640 training images and 12630 testing images.
Dataset Splits No The paper specifies training and testing splits for CIFAR-10 and GTSRB, but does not explicitly mention a validation split or set. For CIFAR-10, it states '50000 training images and 10000 testing images'. For GTSRB, '26640 training images and 12630 testing images'.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions software components and algorithms like KIP, neural tangent kernel (NTK), Adam, ResNet, and backdoor-toolbox, but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes We also set the optimizer to Adam (P. Kingma & Ba, 2015), the learning rate to 0.01, and the batch size to 10 number of class for each dataset. We run KIP with 1000 training steps to generate a distilled dataset.