Good Semi-supervised Learning That Requires a Bad GAN

Authors: Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Russ R. Salakhutdinov

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets. ... Empirically, our approach substantially improves over vanilla feature matching GANs, and obtains new state-of-the-art results on MNIST, SVHN, and CIFAR-10 ... 6 Experiments We mainly consider three widely used benchmark datasets, namely MNIST, SVHN, and CIFAR-10. ... Table 1: Comparison with state-of-the-art methods on three benchmark datasets. ... Table 2: Ablation study.
Researcher Affiliation Academia Zihang Dai , Zhilin Yang , Fan Yang, William W. Cohen, Ruslan Salakhutdinov School of Computer Science Carnegie Melon University dzihang,zhiliny,fanyang1,wcohen,rsalakhu@cs.cmu.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/kimiyoung/ssl_bad_gan.
Open Datasets Yes We mainly consider three widely used benchmark datasets, namely MNIST, SVHN, and CIFAR-10.
Dataset Splits Yes As in previous work, we randomly sample 100, 1,000, and 4,000 labeled samples for MNIST, SVHN, and CIFAR-10 respectively during training, and use the standard data split for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library names with versions).
Experiment Setup Yes We add instance noise to the input of the discriminator [1, 18], and use spatial dropout [20] to obtain faster convergence. Except for these two modifications, we use the same neural network architecture as in [16]. We use the 10-quantile log probability to define the threshold ϵ in Eq. (4).