reproducibilityindex.ai

PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees

Authors: James Jordon, Jinsung Yoon, Mihaela van der Schaar

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, on various datasets, demonstrate that PATE-GAN consistently outperforms the stateof-the-art method with respect to this and other notions of synthetic data quality.
Researcher Affiliation	Collaboration	James Jordon Engineering Science Department University of Oxford, UK; Jinsung Yoon Department of Electrical and Computer Engineering UCLA, California, USA; Mihaela van der Schaar University of Cambridge, UK Department of Electrical and Computer Engineering, UCLA, California, USA Alan Turing Institute, London, UK
Pseudocode	Yes	Pseudo-code for PATE-GAN can be found in Algorithm 1.
Open Source Code	No	The paper mentions using 'tensorﬂow to implement PATE-GAN and DPGAN' and provides a link to the DPGAN benchmark's code ('https://github.com/illidanlab'), but it does not provide a link or explicit statement about the availability of their own PATE-GAN implementation code.
Open Datasets	Yes	In this section, we use a real-world Kaggle dataset (Credit card fraud detection dataset [11]). In addition, we provide high-level (average) results for ﬁve additional datasets (with various characteristics): MAGGIC [27], UNOS-Heart wait-list [7], Kaggle cervical cancer dataset [16], UCI ISOLET dataset and UCI Epileptic Seizure Recognition dataset.
Dataset Splits	No	The paper defines 'training set' and 'testing set' and mentions that they are disjoint, but it does not specify a distinct 'validation set' or typical splits like 80/10/10. While cross-validation is mentioned for hyperparameter tuning, it is not described as a general dataset split for model validation.
Hardware Specification	No	The paper states that experiments were implemented using TensorFlow, but does not provide any specific details about the hardware used (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper mentions using 'tensorﬂow' and 'sklearn package in python' but does not specify any version numbers for these software dependencies, which are necessary for reproducibility.
Experiment Setup	Yes	In all experiments, the depth of the generator and discriminator (student-discriminator in our case) in both PATE-GAN and the DPGAN benchmark [32] is set to 3. The depth of the teacher discriminators is set to 1. The number of hidden nodes in each layer is d, d/2 and d (where d is the feature dimension), respectively. We use relu as the activation functions of each layer except for the output layer where we use the sigmoid activation function and the batch size is 64 for both the generator and discriminator. We set n T = n S = 5. Using cross validation, we select the number of teachers, k, among N/10, N/50, N/100, N/500, N/1000, N/5000, N/10000. The learning rate is 10^-4 and we use Adam Optimizer to minimize the loss function.