Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GAN-EM: GAN Based EM Learning Framework
Authors: Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, Chenliang Xu
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our model, we perform the clustering task based on MNIST and achieve lowest error rate with all 3 different numbers of clusters: 10, 20 and 30, which are common settings in previous works. We also test the semi-supervised classification performance on MNIST and SVHN with partially labeled data, both results being rather competitive compared to recently proposed generative models. Especially, on SVHN dataset, GAN-EM outperforms all other models. Apart from the two commonly used datasets, we test our model on an additional dataset, Celeb A, under both unsupervised and semi-supervised settings, which is a more challenging task because attributes of human faces are rather abstract. It turns out that our model still achieves the best results. |
| Researcher Affiliation | Academia | 1University of Rochester 2Tsinghua University |
| Pseudocode | Yes | Algorithm 1 GAN-EM |
| Open Source Code | Yes | Supplementary material can be found at http://www.cs. rochester.edu/ cxu22/p/. |
| Open Datasets | Yes | We perform unsupervised clustering on MNIST [Le Cun et al., 1989] and Celeb A [Liu et al., 2015] datasets, and semisupervised classification on MNIST, SVHN [Netzer et al., 2011] and Celeb A datasets. |
| Dataset Splits | No | The paper mentions evaluating on datasets and discusses 'test error rate' but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact sample counts for each split) in the main text. |
| Hardware Specification | No | The paper does not specify the hardware used (e.g., GPU models, CPU types) for running experiments. |
| Software Dependencies | No | The paper mentions 'RMSprop optimizer' but does not specify software dependencies with version numbers (e.g., specific Python, PyTorch, or CUDA versions). |
| Experiment Setup | Yes | We apply RMSprop optimizer to all 3 networks G, D and E with learning rate 0.0002 (decay rate: 0.98). ... In each M-step, there are 5 epoches with a minibatch size of 64 for both the generated batch and the real samples batch. We use a same update frequency for generator and discriminator. For E-step, we generate samples using well trained generator with batch size of 256, then we apply 1000 iterations to update E-net. |