Gaussian Membership Inference Privacy

Authors: Tobias Leemann, Martin Pawelczyk, Gjergji Kasneci

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our method on models trained on vision and tabular datasets. 6 Experimental Evaluation
Researcher Affiliation Academia Tobias Leemann University of Tübingen Technical University of Munich Martin Pawelczyk Harvard University Gjergji Kasneci Technical University of Munich
Pseudocode Yes Algorithm 1: Gradient Likelihood Ratio (GLi R) Attack
Open Source Code Yes We release our code online.2 2https://github.com/tleemann/gaussian_mip
Open Datasets Yes We use three datasets that were previously used in works on privacy risks of ML models [32]: The CIFAR-10 dataset which consists of 60k small images [21], the Purchase tabular classification dataset [25] and the Adult income classification dataset from the UCI machine learning repository [12].
Dataset Splits No The paper mentions using specific datasets but does not provide explicit details on how the data was split into training, validation, or test sets (e.g., percentages or sample counts) in the main text. It mentions '60k small images' for CIFAR-10 but not the split.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, or cloud computing resources) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use a model pretrained on CIFAR-100 and finetune the last layer on CIFAR-10 using a Res Net-56 model for this task [16] where the number of fine-tuned parameters equals d = 650. We follow a similar strategy on the Purchase dataset, where we use a three-layer neural network. For finetuning, we use the 20 most common classes and d = 2580 parameters while the model is pretrained on 80 classes. On the adult dataset, we use a two-layer network with 512 random features in the first layer trained from scratch on the dataset such that d = 1026. We sample 20 different privacy levels ranging from µ [0.4, ..., 50] and calibrate the noise in the SGD iteration to reach the desired value of µ.