reproducibilityindex.ai

Do Deep Generative Models Know What They Don't Know?

Authors: Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For the experiment, we trained the same Glow architecture described in Kingma & Dhariwal (2018) except small enough that it could ﬁt on one GPU1 on Fashion MNIST and CIFAR-10. Appendix A provides additional implementation details. We then calculated the log-likelihood (higher value is better) and bits-per-dimension (BPD, lower value is better)2 of the test split of two different data sets of the same dimensionality MNIST (28 28) and SVHN (32 32 3) respectively.
Researcher Affiliation	Collaboration	Eric Nalisnick , Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan Deep Mind Corresponding authors: e.nalisnick@eng.cam.ac.uk and balajiln@google.com. Work done during an internship at Deep Mind.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	We ﬁnd that the density learned by ﬂow-based models, VAEs, and Pixel CNNs cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. Moreover, we ﬁnd evidence of this phenomenon when pairing several popular image data sets: Fashion MNIST vs MNIST, Celeb A vs SVHN, Image Net vs CIFAR-10 / CIFAR-100 / SVHN.
Dataset Splits	No	The paper mentions 'training data' and 'test split' for the datasets used, but does not explicitly provide information about a validation dataset split with specific percentages, counts, or splitting methodologies.
Hardware Specification	No	The paper mentions that the model could 'ﬁt on one GPU' but does not specify the exact GPU model, CPU, memory, or any other specific hardware components used for running the experiments.
Software Dependencies	No	The paper mentions 'TensorFlow' but does not provide specific version numbers for any software dependencies required to replicate the experiment.
Experiment Setup	Yes	For our MNIST experiments, we used a Glow architecture of 2 blocks of 16 afﬁne coupling layers, squeezing the spatial dimension in between the 2 blocks. For our CIFAR experiments, we used 3 blocks of 8 afﬁne coupling blocks, applying the multi-scale architecture between each block. For all coupling blocks, we used a 3-layer Highway network with 200 hidden units for MNIST and 400 hidden units for CIFAR. All networks were trained with the RMSProp optimizer, with a learning rate of 1e 5 for 100K steps, decaying by half at 80K and 90K steps. We used a prior with zero mean and unit variance for all experiments. We applied L2 regularization of 5e 2 to CIFAR experiments. All experiments used batch size 32.