A General Framework for Auditing Differentially Private Machine Learning
Authors: Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott Zaresky-Williams, Edward Raff, Francis Ferraro, Brian Testa
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate significantly improved auditing power over previous approaches on a variety of models including logistic regression, Naive Bayes, and random forest. |
| Researcher Affiliation | Collaboration | Fred Lu Booz Allen Hamilton Joseph Munoz Booz Allen Hamilton Maya Fuchs Booz Allen Hamilton Tyler Le Blond Booz Allen Hamilton Elliott Zaresky-Williams Booz Allen Hamilton Edward Raff Booz Allen Hamilton Francis Ferraro University of Maryland, Baltimore County Brian Testa Air Force Research Laboratory |
| Pseudocode | Yes | Algorithm 1 ML-Audit: Optimizing output set S |
| Open Source Code | No | The paper mentions using existing libraries like 'diffprivlib' and 'Opacus' but does not provide any statement or link indicating that the authors have released the source code for their own framework or methodology described in the paper. |
| Open Datasets | Yes | We assess Naive Bayes, logistic regression (output and objective perturbation), and random forest on common machine learning datasets: adult, credit, iris, breast-cancer, banknote, thoracic. We use the diffprivlib library [27] and implement output perturbation following [22]. Additionally, we test DP-SGD on FMNIST and CIFAR10 (here with N = 500) using [28]. |
| Dataset Splits | No | The paper describes the number of times models were trained (N=10000 or N=500 for DP-SGD) for Monte Carlo estimates, but it does not provide specific train/validation/test dataset splits for the machine learning models themselves. Algorithm 1 refers to training a classifier for `p(D|z)` based on samples, not a general validation split for the main models. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. It only mentions general experimental setups without hardware context. |
| Software Dependencies | No | The paper mentions using the 'diffprivlib library' and 'Opacus' but does not specify the version numbers for these or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We evaluate the DP mechanisms over a range of "th: {0.1, 0.25, 0.5, 1, 2, 4, 8, 16, 50}. For a given dataset D, we perturb k 2 {1, 2, 4, 8} points to get D0 and train N = 10000 times for each to determine the appropriate auditing set S. Then we obtain N new samples to perform the final Monte Carlo estimate and obtain the lower bound ˆ"lb. We use confidence level = 0.05 throughout. |