reproducibilityindex.ai

Scaling Out-of-Distribution Detection for Real-World Settings

Authors: Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joseph Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To make future work in real-world settings possible, we create new benchmarks for three large-scale settings. To test Image Net multiclass anomaly detectors, we introduce the Species dataset containing over 700,000 images and over a thousand anomalous species. We leverage Image Net-21K to evaluate PASCAL VOC and COCO multilabel anomaly detectors. Third, we introduce a new benchmark for anomaly segmentation by introducing a segmentation benchmark with road anomalies. We conduct extensive experiments in these more realistic settings for out-of-distribution detection and ﬁnd that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.
Researcher Affiliation	Academia	Dan Hendrycks * 1 Steven Basart * 2 Mantas Mazeika 3 Andy Zou 1 Joe Kwon 4 Mohammadreza Mostajabi 5 Jacob Steinhardt 1 Dawn Song 1 *Equal contribution 1UC Berkeley 2UChicago 3UIUC 4Yale University 5TTIC.
Pseudocode	No	The paper describes methods in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for our experiments and the Species and CAOS datasets are available at github.com/hendrycks/anomaly-seg.
Open Datasets	Yes	The code for our experiments and the Species and CAOS datasets are available at github.com/hendrycks/anomaly-seg. Our new baseline combined with Species and CAOS benchmarks pave the way for future research on large-scale OOD detection. ... We use PASCAL VOC (Everingham et al., 2009) and MS-COCO (Lin et al., 2014) as in-distribution data.
Dataset Splits	Yes	To obtain representations for anomaly detection, we use models trained on Image Net21K-P, a cleaned version of Image Net-21K with a train/val split (Ridnik et al., 2021a). We evaluate a TRes Net-M, Vi TB-16, and Mixer-B-16 (Ridnik et al., 2021b; Dosovitskiy et al., 2021b; Tolstikhin et al., 2021), and the validation split is used for obtaining in-distribution scores. ... We generate a validation set from the fourth town. ... The original data consists in 7,000 images for training and 1,000 for validation. ... This yields 6,280 training pairs, 910 validation pairs without anomalies, and 810 testing pairs with anomalous objects.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU or CPU models, or cloud instance types used for experiments. It only vaguely mentions 'limited compute'.
Software Dependencies	No	The paper mentions various software components and frameworks like 'Adam optimizer', 'PSPNet', 'Res Net-101', 'Unreal Engine', and 'CARLA simulation environment' but does not provide specific version numbers for any of them.
Experiment Setup	Yes	We train each model for 50 epochs using the Adam optimizer (Kingma & Ba, 2014) with hyperparameter values 10 4 and 10 5 for β1 and β2 respectively. For data augmentation we use standard resizing, random crops, and random ﬂips to obtain images of size 256 256 3. ... For all of the baselines except the autoencoder, we train a PSPNet (Zhao et al., 2017) decoder with a Res Net-101 encoder (He et al., 2015) for 20 epochs. We train both the encoder and decoder using SGD with momentum of 0.9, a learning rate of 2 10 2, and learning rate decay of 10 4. For AE, we use a 4-layer U-Net (Ronneberger et al., 2015) with a spatial latent code as in Baur et al. (2019). The U-Net also uses batch norm and is trained for 10 epochs.