reproducibilityindex.ai

POEM: Out-of-Distribution Detection with Posterior Sampling

Authors: Yifei Ming, Ying Fan, Yixuan Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments comparing POEM with competitive OOD detection methods and various sampling strategies. We show that POEM establishes state-of-the-art results on common benchmarks. We also provide theoretical insights on why outlier mining with high boundary scores benefits sample efficiency.
Researcher Affiliation	Academia	1Department of Computer Sciences, University of Wisconsin-Madison, USA. Correspondence to: Yifei Ming <ming5@wisc.edu>, Ying Fan <yfan87@wisc.edu>, Yixuan Li <sharonli@cs.wisc.edu>.
Pseudocode	Yes	We present the pseudo-code of POEM in Algorithm 2.
Open Source Code	Yes	Code is publicly available at: https://github.com/deeplearning-wisc/poem.
Open Datasets	Yes	We use CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) as in-distribution datasets. A downsampled version of ImageNet (ImageNet-RC) (Chrabaszcz et al., 2017) is used as the auxiliary outlier dataset. For OOD test sets, we use a suite of diverse image datasets including SVHN (Netzer et al., 2011), Textures (Cimpoi et al., 2014), Places365 (Zhou et al., 2017), LSUN-crop, LSUN-resize (Yu et al., 2015), and iSUN (Xu et al., 2015).
Dataset Splits	No	The paper mentions setting a threshold such that 'a high fraction of ID data (e.g., 95%) is correctly classified', implying a validation step for threshold selection. However, it does not explicitly define a separate 'validation dataset split' as part of the data partitioning strategy (e.g., 80/10/10 split or similar).
Hardware Specification	Yes	We run all the experiments on NVIDIA GeForce RTX-2080Ti GPU.
Software Dependencies	Yes	Our implementations are based on Ubuntu Linux 20.04 with Python 3.8.
Experiment Setup	Yes	The pool of outliers consists of randomly selected 400,000 samples from ImageNet-RC, and only 50,000 samples (same size as the ID training set) are selected for training based on the boundary score. ... We use DenseNet-101 as the backbone for all methods and train the model using stochastic gradient descent with Nesterov momentum (Duchi et al., 2011). We set the momentum to be 0.9 and the weight decay coefficient to be 10^-4. The batch size is 64 for both in-distribution and outlier training data. Models are trained for 100 epochs. ... For margin hyperparameters, we use the default as in Liu et al. (2020): min = 7, mout = 25 and β = 0.1.