Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

Authors: Shanglun Feng, Florian Tramèr

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these attacks to MLPs and pretrained transformers (Vi T (Dosovitskiy et al., 2020) and BERT (Devlin et al., 2019)) and reconstruct dozens of finetuning examples across various downstream tasks. We evaluate our backdoor construction on a MLP model trained on CIFAR-10. We carry out our data-stealing attack on popular pretrained transformers and verify that the backdoored models still perform well on the downstream tasks. Figure 4 shows reconstructed images from the Vi T on Caltech 101. Some selected reconstructed examples from BERT finetuned on TREC-50 are in Table 1.
Researcher Affiliation Academia Shanglun Feng 1 Florian Tram er 1 1Department of Computer Science, ETH Zurich, Zurich, Switzerland.
Pseudocode No The paper describes its methods in prose and figures, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce our experiments is at https: //github.com/Shanglun Fengat ETHZ/ Privacy Backdoor.
Open Datasets Yes Datasets. CIFAR-10 (CIFAR-100) (Krizhevsky, 2009) is a 10-class (100-class) object recognition dataset of 50,000 training samples with a size of 32 32 pixels. Oxford-IIIT Pet (Parkhi et al., 2012) is a 37-class animal recognition dataset containing 3,680 training samples. Caltech 101 is a 101-class object recognition dataset containing 9,146 images, of varying sizes of roughly 300 200 pixels. TREC (Hovy et al., 2001; Li & Roth, 2002) is a question classification dataset containing 5,452 labeled questions in the training set.
Dataset Splits No The paper mentions dividing the Caltech 101 dataset into two-thirds for training and one-third for testing, but does not specify a validation set or split.
Hardware Specification No For attackers, weight manipulations and reconstruction can be executed on a laptop s CPU within a few seconds. All our models are trained or finetuned on a GPU within a few minutes. The paper mentions CPU and GPU but does not specify exact models or detailed specifications.
Software Dependencies Yes We implement our backdoor attack using Python 3.10, Pytorch 2.0, and Opacus 1.4.
Experiment Setup Yes Our toy MLP model in Section 4 is trained on a randomly selected subset of 10,000 samples from the CIFAR-10 training set. We use a standard SGD optimizer with a batch size 64 and a learning rate of 0.05, and train for 20 epochs. For the second layer of the toy MLP model, we use amplifier coefficients between 500 and 1000. Specially, we use a standard SGD optimizer with a learning rate of (0.2, 10 4) and a batch size of 128. The pretrained model is finetuned for 12 epochs. We use an SGD optimizer with a learning rate of (0.05, 10 4) and a batch size of 32. A standard DP-SGD optimizer (using the RDP accountant (Mironov et al., 2019)) is utilized in the training recipe with lot size L = 500, gradient norm C = 1, noise multiplier σ = 1.0, and privacy budget δ = 10 5.