Likelihood Ratios for Out-of-Distribution Detection
Authors: Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark Depristo, Joshua Dillon, Balaji Lakshminarayanan
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design experiments on multiple data modalities (images, genomic sequences) to evaluate our method and compare with other baseline methods. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. |
| Researcher Affiliation | Industry | Jie Ren Google Research jjren@google.com Peter J. Liu Google Research peterjliu@google.com Emily Fertig Google Research emilyaf@google.com Jasper Snoek Google Research jsnoek@google.com Ryan Poplin Google Research rpoplin@google.com Mark A. De Pristo Google Research mdepristo@google.com Joshua V. Dillon Google Research jvdillon@google.com Balaji Lakshminarayanan Deep Mind balajiln@google.com |
| Pseudocode | Yes | See Algorithm 1 in Appendix A for the pseudocode for generating input perturbations. The pseudocode for our proposed OOD detection algorithm can be found in Algorithm 2 in Appendix A. |
| Open Source Code | Yes | The dataset and code for the genomics study is available at https://github.com/google-research/google-research/tree/master/genomics_ood. |
| Open Datasets | Yes | We design a new dataset for evaluating OOD methods. The dataset and code for the genomics study is available at https://github.com/google-research/google-research/tree/master/genomics_ood. (a) Fashion-MNIST as in-distribution and MNIST as OOD, (b) CIFAR-10 as in-distribution and SVHN as OOD. |
| Dataset Splits | Yes | We choose two cutoff years, 2011 and 2016, to define the training, validation, and test splits (Figure 4). Our dataset contains of 10 in-distribution classes, 60 OOD classes for validation, and 60 OOD classes for testing. We trained the model using only in-distribution inputs, and we tuned the hyperparameters using validation datasets that include both in-distribution and OOD inputs. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions models like Pixel CNN++ and LSTM, and acknowledges the Google TensorFlow Probability team, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The rate µ is a hyperparameter and can be easily tuned using a small amount of validation OOD dataset (different from the actual OOD dataset of interest). In the case where validation OOD dataset is not available, we show that µ can also be tuned using simulated OOD data. In practice, we observe that µ 2 [0.1, 0.2] achieves good performance empirically for most of the experiments in our paper. Besides adding perturbations to the input data, we found other techniques that can improve model generalization and prevent model memorization, such as adding L2 regularization with coefficient λ to model weights, can help to train a good background model. |