Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
Authors: Shanglun Feng, Florian Tramèr
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these attacks to MLPs and pretrained transformers (Vi T (Dosovitskiy et al., 2020) and BERT (Devlin et al., 2019)) and reconstruct dozens of finetuning examples across various downstream tasks. We evaluate our backdoor construction on a MLP model trained on CIFAR-10. We carry out our data-stealing attack on popular pretrained transformers and verify that the backdoored models still perform well on the downstream tasks. Figure 4 shows reconstructed images from the Vi T on Caltech 101. Some selected reconstructed examples from BERT finetuned on TREC-50 are in Table 1. |
| Researcher Affiliation | Academia | Shanglun Feng 1 Florian Tram er 1 1Department of Computer Science, ETH Zurich, Zurich, Switzerland. |
| Pseudocode | No | The paper describes its methods in prose and figures, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce our experiments is at https: //github.com/Shanglun Fengat ETHZ/ Privacy Backdoor. |
| Open Datasets | Yes | Datasets. CIFAR-10 (CIFAR-100) (Krizhevsky, 2009) is a 10-class (100-class) object recognition dataset of 50,000 training samples with a size of 32 32 pixels. Oxford-IIIT Pet (Parkhi et al., 2012) is a 37-class animal recognition dataset containing 3,680 training samples. Caltech 101 is a 101-class object recognition dataset containing 9,146 images, of varying sizes of roughly 300 200 pixels. TREC (Hovy et al., 2001; Li & Roth, 2002) is a question classification dataset containing 5,452 labeled questions in the training set. |
| Dataset Splits | No | The paper mentions dividing the Caltech 101 dataset into two-thirds for training and one-third for testing, but does not specify a validation set or split. |
| Hardware Specification | No | For attackers, weight manipulations and reconstruction can be executed on a laptop s CPU within a few seconds. All our models are trained or finetuned on a GPU within a few minutes. The paper mentions CPU and GPU but does not specify exact models or detailed specifications. |
| Software Dependencies | Yes | We implement our backdoor attack using Python 3.10, Pytorch 2.0, and Opacus 1.4. |
| Experiment Setup | Yes | Our toy MLP model in Section 4 is trained on a randomly selected subset of 10,000 samples from the CIFAR-10 training set. We use a standard SGD optimizer with a batch size 64 and a learning rate of 0.05, and train for 20 epochs. For the second layer of the toy MLP model, we use amplifier coefficients between 500 and 1000. Specially, we use a standard SGD optimizer with a learning rate of (0.2, 10 4) and a batch size of 128. The pretrained model is finetuned for 12 epochs. We use an SGD optimizer with a learning rate of (0.05, 10 4) and a batch size of 32. A standard DP-SGD optimizer (using the RDP accountant (Mironov et al., 2019)) is utilized in the training recipe with lot size L = 500, gradient norm C = 1, noise multiplier σ = 1.0, and privacy budget δ = 10 5. |