GAIN: Missing Data Imputation using Generative Adversarial Nets

Authors: Jinsung Yoon, James Jordon, Mihaela Schaar

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods. In this section, we validate the performance of GAIN using multiple real-world datasets. In the first set of experiments we qualitatively analyze the properties of GAIN. In the second we quantitatively evaluate the imputation performance of GAIN using various UCI datasets (Lichman, 2013), giving comparisons with state-of-the-art imputation methods.
Researcher Affiliation Academia 1University of California, Los Angeles, CA, USA 2University of Oxford, UK 3Alan Turing Institute, UK. Correspondence to: Jinsung Yoon <jsyoon0823@gmail.com>.
Pseudocode Yes Algorithm 1 Pseudo-code of GAIN
Open Source Code No The paper does not provide an explicit link or statement about open-source code availability for the described methodology.
Open Datasets Yes We use five real-world datasets from UCI Machine Learning Repository (Lichman, 2013) (Breast, Spam, Letter, Credit, and News) to quantitatively evaluate the imputation performance of GAIN.
Dataset Splits Yes We conduct each experiment 10 times and within each experiment we use 5-cross validations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers).
Experiment Setup Yes Details of hyper-parameter selection can be found in the Supplementary Materials. We first optimize the discriminator D with a fixed generator G using mini-batches of size k D. Second, we optimize the generator G using the newly updated discriminator D with mini-batches of size k G. G is then trained to minimize the weighted sum of the two losses as follows: LG(m(j), ˆm(j), b(j)) + αLM( x(j), ˆx(j)), where α is a hyper-parameter.