Understanding Failures in Out-of-Distribution Detection with Deep Generative Models
Authors: Lily Zhang, Mark Goldstein, Rajesh Ranganath
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we offer empirical demonstrations of the analyses presented. First, given an OOD detection method and a specific in-distribution, we provide examples of outdistributions that the method fails to distinguish from the in-distribution (Section 4.1). Then, we showcase an instance where a partially trained DGM yields better OOD detection than the true distribution of the data when supports overlap between the inand out-distribution (Section 4.2). |
| Researcher Affiliation | Academia | Lily H. Zhang 1 Mark Goldstein 1 Rajesh Ranganath 1 1New York University. Correspondence to: Lily H. Zhang <lily.h.zhang@nyu.edu>. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | We thank the reviewers for their very helpful feedback and Kingma & Dhariwal (2018) and Ren et al. (2019) for open-sourcing their code. |
| Open Datasets | Yes | For instance, a model trained on Fashion-MNIST, an image dataset of clothing items, assigns higher likelihoods to MNIST images. The same is true for the training distribution (or in-distribution) CIFAR-10, a dataset of animals and vehicles, and the OOD distribution (or out-distribution) SVHN, a dataset of house numbers. ... specifically the GLOW model of Kingma & Dhariwal (2018) trained on CIFAR-10. ... We choose the Celeb A dataset of celebrity faces as our out-distribution Q. |
| Dataset Splits | No | The paper mentions training on '40,000 samples from P' and evaluating on 'test samples from P', but it does not specify validation splits or proportions for the datasets used in experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a 'GLOW model' and 'PIXELCNN++' but does not specify software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We designate a pretrained DGM as our in-distribution P specifically the GLOW model of Kingma & Dhariwal (2018) trained on CIFAR-10. Next, we train a separate GLOW model Pθ on 40,000 samples from P. See Appendix C for model and training details. ... Our partially-trained model Pθ (only 50 epochs) |