reproducibilityindex.ai

Gaussian Membership Inference Privacy

Authors: Tobias Leemann, Martin Pawelczyk, Gjergji Kasneci

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method on models trained on vision and tabular datasets. 6 Experimental Evaluation
Researcher Affiliation	Academia	Tobias Leemann University of Tübingen Technical University of Munich Martin Pawelczyk Harvard University Gjergji Kasneci Technical University of Munich
Pseudocode	Yes	Algorithm 1: Gradient Likelihood Ratio (GLi R) Attack
Open Source Code	Yes	We release our code online.2 2https://github.com/tleemann/gaussian_mip
Open Datasets	Yes	We use three datasets that were previously used in works on privacy risks of ML models [32]: The CIFAR-10 dataset which consists of 60k small images [21], the Purchase tabular classification dataset [25] and the Adult income classification dataset from the UCI machine learning repository [12].
Dataset Splits	No	The paper mentions using specific datasets but does not provide explicit details on how the data was split into training, validation, or test sets (e.g., percentages or sample counts) in the main text. It mentions '60k small images' for CIFAR-10 but not the split.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, or cloud computing resources) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We use a model pretrained on CIFAR-100 and finetune the last layer on CIFAR-10 using a Res Net-56 model for this task [16] where the number of fine-tuned parameters equals d = 650. We follow a similar strategy on the Purchase dataset, where we use a three-layer neural network. For finetuning, we use the 20 most common classes and d = 2580 parameters while the model is pretrained on 80 classes. On the adult dataset, we use a two-layer network with 512 random features in the first layer trained from scratch on the dataset such that d = 1026. We sample 20 different privacy levels ranging from µ [0.4, ..., 50] and calibrate the noise in the SGD iteration to reach the desired value of µ.