Private Post-GAN Boosting
Authors: Marcel Neunhoeffer, Steven Wu, Cynthia Dwork
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Private PGB on two dimensional toy data, MNIST images, US Census data and a standard machine learning prediction task. Our experiments show that Private PGB improves upon a standard private GAN approach across a collection of quality measures. We also provide a non-private variant of PGB that improves the data quality of standard GAN training. |
| Researcher Affiliation | Academia | Marcel Neunhoeffer University of Mannheim mneunhoe@mail.uni-mannheim.de Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Cynthia Dwork Harvard University dwork@seas.harvard.edu |
| Pseudocode | Yes | Algorithm 1 Differentially Private Post-GAN Boosting |
| Open Source Code | No | The code for the GANs and the PGB algorithm will be made available on Git Hub. |
| Open Datasets | Yes | Datasets. We assess our method with a toy dataset drawn from a mixture of 25 Gaussians, which is commonly used to evaluate the quality of GAN (Srivastava et al., 2017; Azadi et al., 2019; Turner et al., 2019) and synthesize MNIST images. We then turn to real datasets from the American Census, and a standard machine learning dataset (Titanic). |
| Dataset Splits | Yes | For 1940 we synthesize an excerpt of the 1% sample of all Californians that were at least 18 years old. Our training sample consists of 39,660 observations and 8 attributes (sex, age, educational attainment, income, race, Hispanic origin, marital status and county). The test set contains another 9,915 observations. [...] We synthesize the Kaggle Titanic training set (891 observations of Titanic passengers on 8 attributes) and train three machine learning models (Logistic Regression, Random Forests (RF) (Breiman, 2001) and XGBoost (Chen & Guestrin, 2016)) on the synthetic datasets to predict whether someone survived the Titanic catastrophe. We then evaluate the performance on the test set with 418 observations. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using "tensorflow privacy or the opacus library" and refers to "KL-WGAN loss (Song & Ermon, 2020)" and "Gumbel-Softmax trick (Maddison et al., 2016; Jang et al., 2016)" but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | The generator and discriminator are neural nets with two fully connected hidden layers (Discriminator: 128, 256; Generator: 512, 256) with Leaky Re Lu activations. The latent noise vector Z is of dimension 2 and independently sampled from a gaussian distribution with mean 0 and standard deviation of 1. For GAN training we use the KL-WGAN loss (Song & Ermon, 2020). [...] The GAN networks consist of two fully connected hidden layers (256, 128) with Leaky Re Lu activation functions. To sample from categorical attributes we apply the Gumbel-Softmax trick (Maddison et al., 2016; Jang et al., 2016). We run our PGB algorithm over the last 150 stored Generators and Discriminators and train it for T = 400 update steps. |