PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees
Authors: James Jordon, Jinsung Yoon, Mihaela van der Schaar
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments, on various datasets, demonstrate that PATE-GAN consistently outperforms the stateof-the-art method with respect to this and other notions of synthetic data quality. |
| Researcher Affiliation | Collaboration | James Jordon Engineering Science Department University of Oxford, UK; Jinsung Yoon Department of Electrical and Computer Engineering UCLA, California, USA; Mihaela van der Schaar University of Cambridge, UK Department of Electrical and Computer Engineering, UCLA, California, USA Alan Turing Institute, London, UK |
| Pseudocode | Yes | Pseudo-code for PATE-GAN can be found in Algorithm 1. |
| Open Source Code | No | The paper mentions using 'tensorflow to implement PATE-GAN and DPGAN' and provides a link to the DPGAN benchmark's code ('https://github.com/illidanlab'), but it does not provide a link or explicit statement about the availability of their own PATE-GAN implementation code. |
| Open Datasets | Yes | In this section, we use a real-world Kaggle dataset (Credit card fraud detection dataset [11]). In addition, we provide high-level (average) results for five additional datasets (with various characteristics): MAGGIC [27], UNOS-Heart wait-list [7], Kaggle cervical cancer dataset [16], UCI ISOLET dataset and UCI Epileptic Seizure Recognition dataset. |
| Dataset Splits | No | The paper defines 'training set' and 'testing set' and mentions that they are disjoint, but it does not specify a distinct 'validation set' or typical splits like 80/10/10. While cross-validation is mentioned for hyperparameter tuning, it is not described as a general dataset split for model validation. |
| Hardware Specification | No | The paper states that experiments were implemented using TensorFlow, but does not provide any specific details about the hardware used (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions using 'tensorflow' and 'sklearn package in python' but does not specify any version numbers for these software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | In all experiments, the depth of the generator and discriminator (student-discriminator in our case) in both PATE-GAN and the DPGAN benchmark [32] is set to 3. The depth of the teacher discriminators is set to 1. The number of hidden nodes in each layer is d, d/2 and d (where d is the feature dimension), respectively. We use relu as the activation functions of each layer except for the output layer where we use the sigmoid activation function and the batch size is 64 for both the generator and discriminator. We set n T = n S = 5. Using cross validation, we select the number of teachers, k, among N/10, N/50, N/100, N/500, N/1000, N/5000, N/10000. The learning rate is 10^-4 and we use Adam Optimizer to minimize the loss function. |