A Geometric Explanation of the Likelihood OOD Detection Paradox

Authors: Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul Krishnan, Gabriel Loaiza-Ganem

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments, Setup We compare datasets within two classes: (i) 28 28 greyscale images, including FMNIST, MNIST, Omniglot (Lake et al., 2015), and EMNIST (Cohen et al., 2017); and (ii) RGB images resized to 32 32 3, comprising SVHN, CIFAR10 and CIFAR100 (Krizhevsky & Hinton, 2009), Tiny Image Net (Le & Yang, 2015), and a simplified, cropped version of Celeb A (Kist, 2021). We give experimental details on model training in Appendix D.1 and Appendix D.2.
Researcher Affiliation Collaboration 1Layer 6 AI 2University of Toronto 3Vector Institute. Correspondence to: Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Gabriel Loaiza-Ganem <{hamid, brendan, jesse, anthony, gabriel}@layer6.ai>, Rahul G. Krishnan <rahulgk@cs.toronto.edu>.
Pseudocode Yes Algorithm 1 Dual threshold OOD detection, returns True if x is deemed OOD, and False if deemed in-distribution.
Open Source Code Yes Our code is available at https://github.com/ layer6ai-labs/dgm_ood_detection.
Open Datasets Yes We compare datasets within two classes: (i) 28 28 greyscale images, including FMNIST (Xiao et al., 2017), MNIST (Le Cun et al., 1998), Omniglot (Lake et al., 2015), and EMNIST (Cohen et al., 2017); and (ii) RGB images resized to 32 32 3, comprising SVHN (Netzer et al., 2011), CIFAR10 and CIFAR100 (Krizhevsky & Hinton, 2009), Tiny Image Net (Le & Yang, 2015), and a simplified, cropped version of Celeb A (Kist, 2021).
Dataset Splits No The paper mentions 'training data' and 'test data' in various contexts (e.g., 'A-train', 'A-test'), but no specific 'validation' split percentages, sample counts, or explicit methodology for validation set partitioning are provided.
Hardware Specification Yes We used an NVIDIA Tesla V100 SXM2 with 7 hours of GPU time to train each of the models.
Software Dependencies No The paper mentions software components like 'diffusers library' and optimizers 'Adam'/'Adam W', but does not provide specific version numbers for any programming languages, libraries, or frameworks used in the experiments.
Experiment Setup Yes We trained both Glow (Kingma & Dhariwal, 2018) and RQ-NSFs (Durkan et al., 2019) on our datasets, with the hyperparameters detailed in Table 3.