Plugin estimators for selective classification with out-of-distribution detection
Authors: Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that our approaches yields competitive SC and OOD detection trade-offs compared to common baselines. ... Experiments on benchmark image classification datasets (§ 5) show that our plug-in approach yields competitive classification and OOD detection performance at any desired abstention rate, compared to the heuristic approach of Xia and Bouganis (2022), and other common baselines. |
| Researcher Affiliation | Industry | Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittrum, Sanjiv Kumar Google Research {hnarasimhan, adityakmenon, wittawat, sanjivk}@google.com |
| Pseudocode | Yes | Algorithm 1 Loss-based SCOD using an unlabeled mixture of ID and OOD data |
| Open Source Code | No | The paper does not contain an explicit statement offering access to the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use CIFAR-100 (Krizhevsky, 2009) and Image Net (Deng et al., 2009) as the indistribution (ID) datasets, and SVHN (Netzer et al., 2011), Places365 (Zhou et al., 2017), LSUN (Yu et al., 2015) (original and resized), Texture (Cimpoi et al., 2014), Celeb A (Liu et al., 2015), 300K Random Images (Hendrycks et al., 2019), Open Images (Krasin et al., 2017), Open Images-O (Wang et al., 2022a), i Naturalist-O (Huang and Li, 2021) and Colorectal (Kather et al., 2016) as the OOD datasets. |
| Dataset Splits | Yes | For the CIFAR-100 experiments where we use a wild sample containing a mix of ID and OOD examples, we split the original CIFAR-100 training set into two halves, use one half as the inlier sample and the other half to construct the wild sample. ... We hold out 5% of the original ID test set and use it as the strictly inlier sample needed to estimate πmix for Algorithm 1. ... For the pre-trained Image Net experiments, we sample equal number of examples from the Image Net validation sample and the OOD dataset |
| Hardware Specification | No | The paper mentions the models used (Res Net-56, Bi T Res Net-101) but does not specify the hardware (e.g., GPU, CPU models, memory) on which the experiments were run. |
| Software Dependencies | No | The paper mentions using SGD for optimization but does not provide specific version numbers for any software dependencies, libraries, or frameworks used (e.g., Python version, TensorFlow/PyTorch version). |
| Experiment Setup | Yes | We use SGD with momentum as the optimization algorithm for all models. For annealing schedule, the specified learning rate (LR) is the initial rate, which is then decayed by a factor of ten after each epoch in a specified list. For CIFAR, these epochs are 15, 96, 192 and 224. ... Table J.1 also provides details of the learning rate (LR) schedule and other hyper-parameters used in our experiments. |