Why Normalizing Flows Fail to Detect Out-of-Distribution Data
Authors: Polina Kirichenko, Pavel Izmailov, Andrew G. Wilson
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate why normalizing flows perform poorly for OOD detection. We demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image datasets, focusing on flows based on coupling layers. We show that by modifying the architecture of flow coupling layers we can bias the flow towards learning the semantic structure of the target data, improving OOD detection. |
| Researcher Affiliation | Academia | Polina Kirichenko pk1822@nyu.edu New York University Pavel Izmailov pi390@nyu.edu New York University Andrew Gordon Wilson andrewgw@cims.nyu.edu New York University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We also provide code at https://github.com/Polina Kirichenko/flows_ood. |
| Open Datasets | Yes | In Figure 1(a), we show the log-likelihood histogram for a Real NVP flow model [10] trained on the Image Net dataset [37] subsampled to 64 64 resolution. The flow assigns higher likelihood to both the Celeb A dataset of celebrity photos, and the SVHN dataset of images of house numbers, compared to the target Image Net dataset. ... trained on Fashion MNIST ... trained on Celeb A using an SVHN image as OOD. ... We extract embeddings for CIFAR-10, Celeb A and SVHN using an Efficient Net [43] pretrained on Image Net [37]. |
| Dataset Splits | Yes | One approach is to choose a likelihood threshold on a validation dataset, e.g. to satisfy a desired false positive rate, and during test time identify inputs which have likelihood lower than as OOD. ... In practice, flows do not seem to overfit, assigning similar likelihood distributions to train and and test (see e.g. Figure 1(a)). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For the details of the visualization procedure and the training setup please see Appendices E and C. ... Real NVP model with 2 coupling layers and checkerboard masks ... 3-layer Real NVP with horizontal masks ... We construct flows of exactly the same size and architecture (Real NVP with 8 coupling layers and no squeeze layers) with each of these masks, trained on Celeb A and Fashion MNIST. ... To do so, we introduce a bottleneck to the st-networks: a pair of fully-connected layers projecting to a space of dimension l and back to the original input dimension. We insert these layers after the middle layer of the st-network. If the latent dimension l is small, the st-network cannot simply reproduce its input as its output, and thus cannot exploit the local pixel correlations discussed in Section 6. Passing information through multiple layers with a low-dimensional bottleneck also reduces the effect of coupling layer co-adaptation. We train a Real NVP flow varying the latent dimension l on Celeb A and on Fashion MNIST. |