Gaussian Membership Inference Privacy
Authors: Tobias Leemann, Martin Pawelczyk, Gjergji Kasneci
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our method on models trained on vision and tabular datasets. 6 Experimental Evaluation |
| Researcher Affiliation | Academia | Tobias Leemann University of Tübingen Technical University of Munich Martin Pawelczyk Harvard University Gjergji Kasneci Technical University of Munich |
| Pseudocode | Yes | Algorithm 1: Gradient Likelihood Ratio (GLi R) Attack |
| Open Source Code | Yes | We release our code online.2 2https://github.com/tleemann/gaussian_mip |
| Open Datasets | Yes | We use three datasets that were previously used in works on privacy risks of ML models [32]: The CIFAR-10 dataset which consists of 60k small images [21], the Purchase tabular classification dataset [25] and the Adult income classification dataset from the UCI machine learning repository [12]. |
| Dataset Splits | No | The paper mentions using specific datasets but does not provide explicit details on how the data was split into training, validation, or test sets (e.g., percentages or sample counts) in the main text. It mentions '60k small images' for CIFAR-10 but not the split. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU types, or cloud computing resources) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We use a model pretrained on CIFAR-100 and finetune the last layer on CIFAR-10 using a Res Net-56 model for this task [16] where the number of fine-tuned parameters equals d = 650. We follow a similar strategy on the Purchase dataset, where we use a three-layer neural network. For finetuning, we use the 20 most common classes and d = 2580 parameters while the model is pretrained on 80 classes. On the adult dataset, we use a two-layer network with 512 random features in the first layer trained from scratch on the dataset such that d = 1026. We sample 20 different privacy levels ranging from µ [0.4, ..., 50] and calibrate the noise in the SGD iteration to reach the desired value of µ. |