Likelihood Ratios for Out-of-Distribution Detection

Authors: Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark Depristo, Joshua Dillon, Balaji Lakshminarayanan

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design experiments on multiple data modalities (images, genomic sequences) to evaluate our method and compare with other baseline methods. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance.
Researcher Affiliation Industry Jie Ren Google Research jjren@google.com Peter J. Liu Google Research peterjliu@google.com Emily Fertig Google Research emilyaf@google.com Jasper Snoek Google Research jsnoek@google.com Ryan Poplin Google Research rpoplin@google.com Mark A. De Pristo Google Research mdepristo@google.com Joshua V. Dillon Google Research jvdillon@google.com Balaji Lakshminarayanan Deep Mind balajiln@google.com
Pseudocode Yes See Algorithm 1 in Appendix A for the pseudocode for generating input perturbations. The pseudocode for our proposed OOD detection algorithm can be found in Algorithm 2 in Appendix A.
Open Source Code Yes The dataset and code for the genomics study is available at https://github.com/google-research/google-research/tree/master/genomics_ood.
Open Datasets Yes We design a new dataset for evaluating OOD methods. The dataset and code for the genomics study is available at https://github.com/google-research/google-research/tree/master/genomics_ood. (a) Fashion-MNIST as in-distribution and MNIST as OOD, (b) CIFAR-10 as in-distribution and SVHN as OOD.
Dataset Splits Yes We choose two cutoff years, 2011 and 2016, to define the training, validation, and test splits (Figure 4). Our dataset contains of 10 in-distribution classes, 60 OOD classes for validation, and 60 OOD classes for testing. We trained the model using only in-distribution inputs, and we tuned the hyperparameters using validation datasets that include both in-distribution and OOD inputs.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions models like Pixel CNN++ and LSTM, and acknowledges the Google TensorFlow Probability team, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes The rate µ is a hyperparameter and can be easily tuned using a small amount of validation OOD dataset (different from the actual OOD dataset of interest). In the case where validation OOD dataset is not available, we show that µ can also be tuned using simulated OOD data. In practice, we observe that µ 2 [0.1, 0.2] achieves good performance empirically for most of the experiments in our paper. Besides adding perturbations to the input data, we found other techniques that can improve model generalization and prevent model memorization, such as adding L2 regularization with coefficient λ to model weights, can help to train a good background model.