KnockoffGAN: Generating Knockoffs for Feature Selection using Generative Adversarial Networks
Authors: James Jordon, Jinsung Yoon, Mihaela van der Schaar
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the capability of our model to perform feature selection, showing that it performs as well as the originally proposed knockoff generation model in the Gaussian setting and that it outperforms the original model in non-Gaussian settings, including on a real-world dataset. ... 5 EXPERIMENTS In this section we demonstrate the capability of our method to match the results of [7] in settings where their model is correctly specified (i.e. when the underlying feature distribution is Gaussian) and then go on to show, in settings where the underlying feature distribution is non-Gaussian, that our method is able to outperform their Gaussian approximation. |
| Researcher Affiliation | Academia | James Jordon Engineering Science Department University of Oxford, UK james.jordon@wolfson.ox.ac.uk Jinsung Yoon Department of Electrical and Computer Engineering UCLA, California, USA jsyoon0823@g.ucla.edu Mihaela van der Schaar University of Cambridge, UK Department of Electrical and Computer Engineering, UCLA, California, USA Alan Turing Institute, London, UK mihaela@ee.ucla.edu |
| Pseudocode | Yes | Algorithm 1 Pseudo-code of Knockoff GAN |
| Open Source Code | No | No explicit statement or link to their own open-source code for KnockoffGAN. Links provided are for benchmarks. |
| Open Datasets | No | In this section we use a biobank dataset (with 387 dimensions)... To preserve anonymity of the authors, the full details of this dataset will be given upon acceptance of the paper. |
| Dataset Splits | No | The paper states 'the number of samples to be n = 3000' for synthetic data but does not provide explicit train/validation/test splits, percentages, or counts for its datasets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper states 'Knockoff GAN is implemented in tensorflow.' but does not provide specific version numbers for TensorFlow or any other software dependencies required for reproducibility. |
| Experiment Setup | Yes | The number of hidden nodes in each layer is d/4, d/16 d/4 for the generator, discriminator, and WGAN-GP, respectively. For the power network, we use 2 diagonal matrices for each layer to make two hidden nodes for each feature separately. We use Re Lu and tanh as the activation functions for each layer except for the output layer where we use a linear activation function for the generator, power network and WGAN-GP networks and sigmoid activation function for the discriminator network. The number of samples in each mini-batch is 128. ... where λ, µ are hyper-parameters (set to 1 in the experiments section). ... η is a hyper-parameter (set to 10 in practice). ... We chose 0.9 to balance this, following the implementation of [38]. |