Private Post-GAN Boosting

Authors: Marcel Neunhoeffer, Steven Wu, Cynthia Dwork

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Private PGB on two dimensional toy data, MNIST images, US Census data and a standard machine learning prediction task. Our experiments show that Private PGB improves upon a standard private GAN approach across a collection of quality measures. We also provide a non-private variant of PGB that improves the data quality of standard GAN training.
Researcher Affiliation Academia Marcel Neunhoeffer University of Mannheim mneunhoe@mail.uni-mannheim.de Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Cynthia Dwork Harvard University dwork@seas.harvard.edu
Pseudocode Yes Algorithm 1 Differentially Private Post-GAN Boosting
Open Source Code No The code for the GANs and the PGB algorithm will be made available on Git Hub.
Open Datasets Yes Datasets. We assess our method with a toy dataset drawn from a mixture of 25 Gaussians, which is commonly used to evaluate the quality of GAN (Srivastava et al., 2017; Azadi et al., 2019; Turner et al., 2019) and synthesize MNIST images. We then turn to real datasets from the American Census, and a standard machine learning dataset (Titanic).
Dataset Splits Yes For 1940 we synthesize an excerpt of the 1% sample of all Californians that were at least 18 years old. Our training sample consists of 39,660 observations and 8 attributes (sex, age, educational attainment, income, race, Hispanic origin, marital status and county). The test set contains another 9,915 observations. [...] We synthesize the Kaggle Titanic training set (891 observations of Titanic passengers on 8 attributes) and train three machine learning models (Logistic Regression, Random Forests (RF) (Breiman, 2001) and XGBoost (Chen & Guestrin, 2016)) on the synthetic datasets to predict whether someone survived the Titanic catastrophe. We then evaluate the performance on the test set with 418 observations.
Hardware Specification No The paper does not explicitly mention any specific hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using "tensorflow privacy or the opacus library" and refers to "KL-WGAN loss (Song & Ermon, 2020)" and "Gumbel-Softmax trick (Maddison et al., 2016; Jang et al., 2016)" but does not provide specific version numbers for these software components or libraries.
Experiment Setup Yes The generator and discriminator are neural nets with two fully connected hidden layers (Discriminator: 128, 256; Generator: 512, 256) with Leaky Re Lu activations. The latent noise vector Z is of dimension 2 and independently sampled from a gaussian distribution with mean 0 and standard deviation of 1. For GAN training we use the KL-WGAN loss (Song & Ermon, 2020). [...] The GAN networks consist of two fully connected hidden layers (256, 128) with Leaky Re Lu activation functions. To sample from categorical attributes we apply the Gumbel-Softmax trick (Maddison et al., 2016; Jang et al., 2016). We run our PGB algorithm over the last 150 stored Generators and Discriminators and train it for T = 400 update steps.