Scaling Out-of-Distribution Detection for Real-World Settings
Authors: Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joseph Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, Dawn Song
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To make future work in real-world settings possible, we create new benchmarks for three large-scale settings. To test Image Net multiclass anomaly detectors, we introduce the Species dataset containing over 700,000 images and over a thousand anomalous species. We leverage Image Net-21K to evaluate PASCAL VOC and COCO multilabel anomaly detectors. Third, we introduce a new benchmark for anomaly segmentation by introducing a segmentation benchmark with road anomalies. We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work. |
| Researcher Affiliation | Academia | Dan Hendrycks * 1 Steven Basart * 2 Mantas Mazeika 3 Andy Zou 1 Joe Kwon 4 Mohammadreza Mostajabi 5 Jacob Steinhardt 1 Dawn Song 1 *Equal contribution 1UC Berkeley 2UChicago 3UIUC 4Yale University 5TTIC. |
| Pseudocode | No | The paper describes methods in text but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for our experiments and the Species and CAOS datasets are available at github.com/hendrycks/anomaly-seg. |
| Open Datasets | Yes | The code for our experiments and the Species and CAOS datasets are available at github.com/hendrycks/anomaly-seg. Our new baseline combined with Species and CAOS benchmarks pave the way for future research on large-scale OOD detection. ... We use PASCAL VOC (Everingham et al., 2009) and MS-COCO (Lin et al., 2014) as in-distribution data. |
| Dataset Splits | Yes | To obtain representations for anomaly detection, we use models trained on Image Net21K-P, a cleaned version of Image Net-21K with a train/val split (Ridnik et al., 2021a). We evaluate a TRes Net-M, Vi TB-16, and Mixer-B-16 (Ridnik et al., 2021b; Dosovitskiy et al., 2021b; Tolstikhin et al., 2021), and the validation split is used for obtaining in-distribution scores. ... We generate a validation set from the fourth town. ... The original data consists in 7,000 images for training and 1,000 for validation. ... This yields 6,280 training pairs, 910 validation pairs without anomalies, and 810 testing pairs with anomalous objects. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models, or cloud instance types used for experiments. It only vaguely mentions 'limited compute'. |
| Software Dependencies | No | The paper mentions various software components and frameworks like 'Adam optimizer', 'PSPNet', 'Res Net-101', 'Unreal Engine', and 'CARLA simulation environment' but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We train each model for 50 epochs using the Adam optimizer (Kingma & Ba, 2014) with hyperparameter values 10 4 and 10 5 for β1 and β2 respectively. For data augmentation we use standard resizing, random crops, and random flips to obtain images of size 256 256 3. ... For all of the baselines except the autoencoder, we train a PSPNet (Zhao et al., 2017) decoder with a Res Net-101 encoder (He et al., 2015) for 20 epochs. We train both the encoder and decoder using SGD with momentum of 0.9, a learning rate of 2 10 2, and learning rate decay of 10 4. For AE, we use a 4-layer U-Net (Ronneberger et al., 2015) with a spatial latent code as in Baur et al. (2019). The U-Net also uses batch norm and is trained for 10 epochs. |