Good Semi-supervised Learning That Requires a Bad GAN
Authors: Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Russ R. Salakhutdinov
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets. ... Empirically, our approach substantially improves over vanilla feature matching GANs, and obtains new state-of-the-art results on MNIST, SVHN, and CIFAR-10 ... 6 Experiments We mainly consider three widely used benchmark datasets, namely MNIST, SVHN, and CIFAR-10. ... Table 1: Comparison with state-of-the-art methods on three benchmark datasets. ... Table 2: Ablation study. |
| Researcher Affiliation | Academia | Zihang Dai , Zhilin Yang , Fan Yang, William W. Cohen, Ruslan Salakhutdinov School of Computer Science Carnegie Melon University dzihang,zhiliny,fanyang1,wcohen,rsalakhu@cs.cmu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/kimiyoung/ssl_bad_gan. |
| Open Datasets | Yes | We mainly consider three widely used benchmark datasets, namely MNIST, SVHN, and CIFAR-10. |
| Dataset Splits | Yes | As in previous work, we randomly sample 100, 1,000, and 4,000 labeled samples for MNIST, SVHN, and CIFAR-10 respectively during training, and use the standard data split for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library names with versions). |
| Experiment Setup | Yes | We add instance noise to the input of the discriminator [1, 18], and use spatial dropout [20] to obtain faster convergence. Except for these two modifications, we use the same neural network architecture as in [16]. We use the 10-quantile log probability to define the threshold ϵ in Eq. (4). |