Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Authors: Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat.
Researcher Affiliation Collaboration 1University of Maryland, College Park 2Oregon State University 3ELLIS Institute, MPI for Intelligent Systems 4Google Deep Mind
Pseudocode No The paper describes the attack mechanism and experimental procedures in narrative text and mathematical equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes All the models and datasets used in this paper are open-sourced, and we include our code in the supplemental material.
Open Datasets Yes We present our experimental results, averaged over 5 random seeds, on datasets including Image Net (Deng et al., 2009), CIFAR-10 (Krizhevsky and Hinton, 2009), and CIFAR-100 (Krizhevsky and Hinton, 2009). Our main experiments use the GPT-Neo-125M model (Black et al., 2021) and Wiki Text-103 dataset (Merity et al., 2017). We inject 1, 000 randomly selected canaries from ai4Privacy (2023)... We employ MIMIC-IV (Johnson et al., 2023) for fine-tuning.
Dataset Splits No During fine-tuning, following the hyper-parameters from Wortsman et al. (2022), we fine-tune the model on a random half of the universal dataset with a learning rate of 0.00003 over 5 epochs. During the poisoning phase, the validation set serves as Daux.
Hardware Specification Yes Most of our computing resources are allocated to fine-tuning models, utilizing up to four RTX A4000 GPUs at the same time.
Software Dependencies No The paper mentions optimizers like Adam W and various models (e.g., CLIP, GPT-Neo), but it does not specify versions for software libraries, frameworks, or programming languages (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For the poisoning phase, we set α = 0.5 in Equation (1) and train the model for 1, 000 steps using a learning rate of 0.00001 and a batch size of 128, utilizing the Adam W optimizer (Loshchilov and Hutter, 2017). During fine-tuning, following the hyper-parameters from Wortsman et al. (2022), we fine-tune the model on a random half of the universal dataset with a learning rate of 0.00003 over 5 epochs.