Bayesian Estimation of Differential Privacy
Authors: Santiago Zanella-Beguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement an end-to-end system for privacy estimation that integrates our approach and state-of-the-art membership inference attacks, and evaluate it on text and vision classification tasks. |
| Researcher Affiliation | Collaboration | 1Microsoft, Cambridge, UK 2University College London, London, UK. |
| Pseudocode | Yes | Algorithm 3 in Appendix A.1 shows pseudocode for this challenge point selection procedure. (...) A.1. Omitted algorithms Algorithm 3: Select Worst |
| Open Source Code | Yes | The implementation of the core Bayesian estimation method used to produce the results reported in the paper can be found at https://aka.ms/privacy-estimates. |
| Open Datasets | Yes | We evaluate our system for privacy estimation on text (SST-2) and vision (CIFAR-10) classification tasks. (...) CIFAR-10 (Krizhevsky, 2009) (...) SST 2 (Socher et al., 2013) |
| Dataset Splits | No | CIFAR-10 (...) consists of 60 000 labeled (50 000 training, 10 000 test) labeled images. (...) SST 2 (...) consisting of 67 349 training samples and 1821 test samples. |
| Hardware Specification | No | We implement the system as an Azure ML pipeline, allowing for an efficient utilization of large GPU clusters, but the system can make use of more modest resources and its design is generic enough to be ported to any other ML framework. |
| Software Dependencies | No | The resulting integral in Equation (3) cannot be expressed in analytical form so we approximate it numerically using SciPy s dblquad, based on QUADPACK s qagse. (...) We implement the end-to-end pipeline depicted in Figure 6 in Azure ML. (...) our original implementation used open-source libraries (Ray Tune). |
| Experiment Setup | Yes | We use a 4-layer CNN with 974 K parameters and tanh activations with average pooling and max pooling units, which we train for 50 epochs. (...) We fine-tune a RoBERTa base model with a classification head for 3 epochs (Liu et al., 2021). |