reproducibilityindex.ai

GAIN: Missing Data Imputation using Generative Adversarial Nets

Authors: Jinsung Yoon, James Jordon, Mihaela Schaar

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We tested our method on various datasets and found that GAIN signiﬁcantly outperforms state-of-the-art imputation methods. In this section, we validate the performance of GAIN using multiple real-world datasets. In the ﬁrst set of experiments we qualitatively analyze the properties of GAIN. In the second we quantitatively evaluate the imputation performance of GAIN using various UCI datasets (Lichman, 2013), giving comparisons with state-of-the-art imputation methods.
Researcher Affiliation	Academia	1University of California, Los Angeles, CA, USA 2University of Oxford, UK 3Alan Turing Institute, UK. Correspondence to: Jinsung Yoon <jsyoon0823@gmail.com>.
Pseudocode	Yes	Algorithm 1 Pseudo-code of GAIN
Open Source Code	No	The paper does not provide an explicit link or statement about open-source code availability for the described methodology.
Open Datasets	Yes	We use ﬁve real-world datasets from UCI Machine Learning Repository (Lichman, 2013) (Breast, Spam, Letter, Credit, and News) to quantitatively evaluate the imputation performance of GAIN.
Dataset Splits	Yes	We conduct each experiment 10 times and within each experiment we use 5-cross validations.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers).
Experiment Setup	Yes	Details of hyper-parameter selection can be found in the Supplementary Materials. We ﬁrst optimize the discriminator D with a ﬁxed generator G using mini-batches of size k D. Second, we optimize the generator G using the newly updated discriminator D with mini-batches of size k G. G is then trained to minimize the weighted sum of the two losses as follows: LG(m(j), ˆm(j), b(j)) + αLM( x(j), ˆx(j)), where α is a hyper-parameter.