When and How Does In-Distribution Label Help Out-of-Distribution Detection?
Authors: Xuefeng Du, Yiyou Sun, Yixuan Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we present empirical results on both simulated and real datasets, validating theoretical guarantees and reinforcing our insights. |
| Researcher Affiliation | Academia | Xuefeng Du 1 Yiyou Sun 1 Yixuan Li 1 1Department of Computer Sciences, UW-Madison. Correspondence to: Yixuan Li <sharonli@cs.wisc.edu>. |
| Pseudocode | No | The paper presents mathematical formulations and derivations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available at https://github.com/ deeplearning-wisc/id_label. |
| Open Datasets | Yes | For ID datasets, we use CIFAR10 and CIFAR100 (Krizhevsky et al., 2009). For far OOD test datasets, we use a suite of natural image datasets including TEXTURES (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), PLACES365 (Zhou et al., 2017), and LSUN (Yu et al., 2015). |
| Dataset Splits | No | The paper mentions using 75% of the OOD dataset for linear probing and the remaining for testing, which describes a train/test split for the linear probing phase. However, it does not explicitly state a separate validation split with specific percentages or counts for the primary neural network training. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for its experiments, such as particular GPU or CPU models. |
| Software Dependencies | No | The paper describes the optimizers and training configurations, but it does not list specific software components with version numbers (e.g., 'PyTorch 1.x' or 'Python 3.x'). |
| Experiment Setup | Yes | For CIFAR10, we set ϕu = 0.5, ϕl = 0.25 with training epoch 200, and we evaluate using features extracted from the layer preceding the projection. For CIFAR100, we set ϕu = 3, ϕl = 0.0225 with 200 training epochs and assess based on the projection layer s features. We use SGD with momentum 0.9 as an optimizer with cosine annealing (lr=0.03), weight decay 5e-4, and batch size 512. For linear probing, we train a linear layer on the extracted features from the pretrained model by contrastive learning. We use SGD for 50 epochs with momentum 0.9 as an optimizer with a decayed learning rate in epoch 30 by 0.2 (The initial learning rate is 5), and batch size 512. |