Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Metropolis-Hastings Generative Adversarial Networks
Authors: Ryan Turner, Jane Hung, Eric Frank, Yunus Saatchi, Jason Yosinski
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the benefits of the improved generator on multiple benchmark datasets, including CIFAR10 and Celeb A, using the DCGAN, WGAN, and progressive GAN. ... Results on real data (CIFAR-10 and Celeb A) and extending common GAN models (DCGAN, WGAN, and progressive GAN) are shown in Section 5. |
| Researcher Affiliation | Industry | Ryan Turner 1 Jane Hung 1 Eric Frank 1 Yunus Saatci 1 Jason Yosinski 1 1Uber AI Labs. Correspondence to: Ryan Turner <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 MH-GAN: Input: generator G, calibrated disc. D, real samples Assign random real sample x0 to x for k = 1 to K do Draw x from G Draw U from Uniform(0, 1) if U (D(x) 1 1)/(D(x ) 1 1) then end if end for If x is still real sample x0 restart with draw from G as x0 Output: sample x from G |
| Open Source Code | Yes | 1Code found at: github.com/uber-research/metropolis-hastings-gans |
| Open Datasets | Yes | We consider the 5 5 grid of two-dimensional Gaussians used in Azadi et al. (2018)... For real data experiments we considered the Celeb A (Liu et al., 2015) and CIFAR-10 (Torralba et al., 2008) data sets... |
| Dataset Splits | Yes | We used 64,000 standardized training points and generated 10,000 points in test. ... To correct an uncalibrated classifier, denoted D X R, we use a held out calibration set (e.g., 10% of the training data) and either logistic, isotonic, or beta (Kull et al., 2017) regression to warp the output of D. |
| Hardware Specification | No | The paper mentions "Using multiple chains is also better for GPU parallelization" but does not specify any particular GPU models, CPU types, or other hardware specifications used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | Following Azadi et al. (2018), we use four fully connected layers with Re LU activations for both the generator and discriminator. The final output layer of the discriminator is a sigmoid, and no nonlinearity is applied to the final generator layer. All hidden layers have size 100, with a latent z R2. ... running MCMC to k = 640 iterations in all cases. ... Results are computed at epoch 60... |