When and How Does In-Distribution Label Help Out-of-Distribution Detection?

Authors: Xuefeng Du, Yiyou Sun, Yixuan Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we present empirical results on both simulated and real datasets, validating theoretical guarantees and reinforcing our insights.
Researcher Affiliation Academia Xuefeng Du 1 Yiyou Sun 1 Yixuan Li 1 1Department of Computer Sciences, UW-Madison. Correspondence to: Yixuan Li <sharonli@cs.wisc.edu>.
Pseudocode No The paper presents mathematical formulations and derivations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is publicly available at https://github.com/ deeplearning-wisc/id_label.
Open Datasets Yes For ID datasets, we use CIFAR10 and CIFAR100 (Krizhevsky et al., 2009). For far OOD test datasets, we use a suite of natural image datasets including TEXTURES (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), PLACES365 (Zhou et al., 2017), and LSUN (Yu et al., 2015).
Dataset Splits No The paper mentions using 75% of the OOD dataset for linear probing and the remaining for testing, which describes a train/test split for the linear probing phase. However, it does not explicitly state a separate validation split with specific percentages or counts for the primary neural network training.
Hardware Specification No The paper does not explicitly state the specific hardware used for its experiments, such as particular GPU or CPU models.
Software Dependencies No The paper describes the optimizers and training configurations, but it does not list specific software components with version numbers (e.g., 'PyTorch 1.x' or 'Python 3.x').
Experiment Setup Yes For CIFAR10, we set ϕu = 0.5, ϕl = 0.25 with training epoch 200, and we evaluate using features extracted from the layer preceding the projection. For CIFAR100, we set ϕu = 3, ϕl = 0.0225 with 200 training epochs and assess based on the projection layer s features. We use SGD with momentum 0.9 as an optimizer with cosine annealing (lr=0.03), weight decay 5e-4, and batch size 512. For linear probing, we train a linear layer on the extracted features from the pretrained model by contrastive learning. We use SGD for 50 epochs with momentum 0.9 as an optimizer with a decayed learning rate in epoch 30 by 0.2 (The initial learning rate is 5), and batch size 512.