reproducibilityindex.ai

Understanding Failures in Out-of-Distribution Detection with Deep Generative Models

Authors: Lily Zhang, Mark Goldstein, Rajesh Ranganath

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we offer empirical demonstrations of the analyses presented. First, given an OOD detection method and a speciﬁc in-distribution, we provide examples of outdistributions that the method fails to distinguish from the in-distribution (Section 4.1). Then, we showcase an instance where a partially trained DGM yields better OOD detection than the true distribution of the data when supports overlap between the inand out-distribution (Section 4.2).
Researcher Affiliation	Academia	Lily H. Zhang 1 Mark Goldstein 1 Rajesh Ranganath 1 1New York University. Correspondence to: Lily H. Zhang <lily.h.zhang@nyu.edu>.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	We thank the reviewers for their very helpful feedback and Kingma & Dhariwal (2018) and Ren et al. (2019) for open-sourcing their code.
Open Datasets	Yes	For instance, a model trained on Fashion-MNIST, an image dataset of clothing items, assigns higher likelihoods to MNIST images. The same is true for the training distribution (or in-distribution) CIFAR-10, a dataset of animals and vehicles, and the OOD distribution (or out-distribution) SVHN, a dataset of house numbers. ... specifically the GLOW model of Kingma & Dhariwal (2018) trained on CIFAR-10. ... We choose the Celeb A dataset of celebrity faces as our out-distribution Q.
Dataset Splits	No	The paper mentions training on '40,000 samples from P' and evaluating on 'test samples from P', but it does not specify validation splits or proportions for the datasets used in experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using a 'GLOW model' and 'PIXELCNN++' but does not specify software names with version numbers for reproducibility.
Experiment Setup	Yes	We designate a pretrained DGM as our in-distribution P speciﬁcally the GLOW model of Kingma & Dhariwal (2018) trained on CIFAR-10. Next, we train a separate GLOW model Pθ on 40,000 samples from P. See Appendix C for model and training details. ... Our partially-trained model Pθ (only 50 epochs)