Do Deep Generative Models Know What They Don't Know?
Authors: Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For the experiment, we trained the same Glow architecture described in Kingma & Dhariwal (2018) except small enough that it could fit on one GPU1 on Fashion MNIST and CIFAR-10. Appendix A provides additional implementation details. We then calculated the log-likelihood (higher value is better) and bits-per-dimension (BPD, lower value is better)2 of the test split of two different data sets of the same dimensionality MNIST (28 28) and SVHN (32 32 3) respectively. |
| Researcher Affiliation | Collaboration | Eric Nalisnick , Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, Balaji Lakshminarayanan Deep Mind Corresponding authors: e.nalisnick@eng.cam.ac.uk and balajiln@google.com. Work done during an internship at Deep Mind. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | We find that the density learned by flow-based models, VAEs, and Pixel CNNs cannot distinguish images of common objects such as dogs, trucks, and horses (i.e. CIFAR-10) from those of house numbers (i.e. SVHN), assigning a higher likelihood to the latter when the model is trained on the former. Moreover, we find evidence of this phenomenon when pairing several popular image data sets: Fashion MNIST vs MNIST, Celeb A vs SVHN, Image Net vs CIFAR-10 / CIFAR-100 / SVHN. |
| Dataset Splits | No | The paper mentions 'training data' and 'test split' for the datasets used, but does not explicitly provide information about a validation dataset split with specific percentages, counts, or splitting methodologies. |
| Hardware Specification | No | The paper mentions that the model could 'fit on one GPU' but does not specify the exact GPU model, CPU, memory, or any other specific hardware components used for running the experiments. |
| Software Dependencies | No | The paper mentions 'TensorFlow' but does not provide specific version numbers for any software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | For our MNIST experiments, we used a Glow architecture of 2 blocks of 16 affine coupling layers, squeezing the spatial dimension in between the 2 blocks. For our CIFAR experiments, we used 3 blocks of 8 affine coupling blocks, applying the multi-scale architecture between each block. For all coupling blocks, we used a 3-layer Highway network with 200 hidden units for MNIST and 400 hidden units for CIFAR. All networks were trained with the RMSProp optimizer, with a learning rate of 1e 5 for 100K steps, decaying by half at 80K and 90K steps. We used a prior with zero mean and unit variance for all experiments. We applied L2 regularization of 5e 2 to CIFAR experiments. All experiments used batch size 32. |