reproducibilityindex.ai

Plugin estimators for selective classification with out-of-distribution detection

Authors: Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that our approaches yields competitive SC and OOD detection trade-offs compared to common baselines. ... Experiments on benchmark image classiﬁcation datasets (§ 5) show that our plug-in approach yields competitive classiﬁcation and OOD detection performance at any desired abstention rate, compared to the heuristic approach of Xia and Bouganis (2022), and other common baselines.
Researcher Affiliation	Industry	Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittrum, Sanjiv Kumar Google Research {hnarasimhan, adityakmenon, wittawat, sanjivk}@google.com
Pseudocode	Yes	Algorithm 1 Loss-based SCOD using an unlabeled mixture of ID and OOD data
Open Source Code	No	The paper does not contain an explicit statement offering access to the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	We use CIFAR-100 (Krizhevsky, 2009) and Image Net (Deng et al., 2009) as the indistribution (ID) datasets, and SVHN (Netzer et al., 2011), Places365 (Zhou et al., 2017), LSUN (Yu et al., 2015) (original and resized), Texture (Cimpoi et al., 2014), Celeb A (Liu et al., 2015), 300K Random Images (Hendrycks et al., 2019), Open Images (Krasin et al., 2017), Open Images-O (Wang et al., 2022a), i Naturalist-O (Huang and Li, 2021) and Colorectal (Kather et al., 2016) as the OOD datasets.
Dataset Splits	Yes	For the CIFAR-100 experiments where we use a wild sample containing a mix of ID and OOD examples, we split the original CIFAR-100 training set into two halves, use one half as the inlier sample and the other half to construct the wild sample. ... We hold out 5% of the original ID test set and use it as the strictly inlier sample needed to estimate πmix for Algorithm 1. ... For the pre-trained Image Net experiments, we sample equal number of examples from the Image Net validation sample and the OOD dataset
Hardware Specification	No	The paper mentions the models used (Res Net-56, Bi T Res Net-101) but does not specify the hardware (e.g., GPU, CPU models, memory) on which the experiments were run.
Software Dependencies	No	The paper mentions using SGD for optimization but does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python version, TensorFlow/PyTorch version).
Experiment Setup	Yes	We use SGD with momentum as the optimization algorithm for all models. For annealing schedule, the speciﬁed learning rate (LR) is the initial rate, which is then decayed by a factor of ten after each epoch in a speciﬁed list. For CIFAR, these epochs are 15, 96, 192 and 224. ... Table J.1 also provides details of the learning rate (LR) schedule and other hyper-parameters used in our experiments.