Students Parrot Their Teachers: Membership Inference on Model Distillation

Authors: Matthew Jagielski, Milad Nasr, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini, Florian Tramer

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we design membership inference attacks to systematically study the privacy provided by knowledge distillation to both the teacher and student training sets. Our new attacks show that distillation alone provides only limited privacy across a number of domains. We systematically evaluate a number of factors which impact the empirical privacy of model distillation.
Researcher Affiliation Collaboration 1Google Deep Mind 2ETH Zurich
Pseudocode No The paper describes the steps of the Likelihood Ratio Attack (Li RA) and its adaptations, for example, 'In Li RA, the adversary first trains many shadow models...' but it does not present these steps in a formal pseudocode or algorithm block format.
Open Source Code No Unfortunately, we are unable to make our code public at this time due to organizational constraints.
Open Datasets Yes We study four standard datasets for our analysis: CIFAR-10, Wiki Text103, Purchase100, and Texas100. On CIFAR-10, we start with code from the DAWNBench benchmark [CNKZZNBORZ17]... We remove 5275 duplicates from CIFAR-10, using the imagededup library [JLJT19]...
Dataset Splits Yes We remove 5275 duplicates from CIFAR-10, using the imagededup library [JLJT19], and split the remaining dataset into a teacher set of 30,000 examples and a student set of 14,725 examples. On Wiki Text103, we split Wiki Text103 into a teacher set of 500,000 records, and use the remaining records to train the student models. On Purchase100 and Texas100, we train single-layer neural networks... and subsample the datasets to produce teacher and student sets of 20000 examples each.
Hardware Specification Yes All of our results on CIFAR-10 make use of fewer than 30000 trained models. While a very large number of models, the fast, publicly available training code we use allows us to train this number of models in fewer than 1 GPU-week (although we decrease the wall-clock time by parallelizing over 4 GPUs). Our results on Purchase-100 and Texas-100 also use simple models, taking under 1 minute to train (we train all models for 20 epochs with SGD with a learning rate of 0.01 and momentum parameter of 0.99, which we found to maximize performance over our hyperparameter sweep). Our most expensive attack, relying on only student queries, starts to outperform random guessing with as few as 100 models, which can be trained on 1 GPU in two hours on all three of these datasets.
Software Dependencies No The paper mentions tools and architectures like 'imagededup library', 'GPT-2 architecture', 'ResNet-9 model', and training methods like 'cross entropy loss' and 'SGD'. However, it does not provide specific version numbers for these software components or the underlying machine learning frameworks (e.g., PyTorch, TensorFlow, etc.) used for implementation.
Experiment Setup Yes On all datasets, we train our models with the cross entropy loss: teacher models are trained with the standard sparse cross entropy loss on the teacher dataset, and student models are trained with a dense cross entropy loss to mimic the soft labels predicted by the teacher. Unless otherwise stated, all Li RA-based attacks use 100 shadow models for calibration, and all figures are produced by running the attack on over 1000 models. (we train all models for 20 epochs with SGD with a learning rate of 0.01 and momentum parameter of 0.99, which we found to maximize performance over our hyperparameter sweep).