Extremely Simple Activation Shaping for Out-of-Distribution Detection
Authors: Andrija Djurisic, Nebojsa Bozanic, Arjun Ashok, Rosanne Liu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that such a simple treatment enhances in-distribution and out-of-distribution distinction so as to allow state-of-the-art OOD detection on Image Net, and does not noticeably deteriorate the in-distribution accuracy. Video, animation and code can be found at: https://andrijazz.github.io/ash. In the rest of the paper we develop and evaluate ASH via the following contributions: When evaluated across a suite of vision tasks including 3 ID datasets and 10 OOD datasets (Table 1), ASH immediately improves OOD detection performances across the board, establishing a new state of the art (SOTA), meanwhile providing the optimal ID-OOD trade-off, supplying a new Pareto frontier (Figure 2). We present extensive ablation studies on different design choices, including placements, pruning strength, and shaping treatments of ASH, while demonstrating how ASH can be readily combined with other methods, revealing the unexpected effectiveness and flexibility of such a simple operation (Section 5). |
| Researcher Affiliation | Collaboration | Andrija Djurisic1 Nebojsa Bozanic1,2 Arjun Ashok1 Rosanne Liu1,3 1ML Collective. 2Faculty of Technical Sciences, University of Novi Sad. 3Google Research, Brain Team. Correspondence to andrija@mlcollective.org. |
| Pseudocode | Yes | Algorithm 1 ASH-P: Activation Shaping with Pruning Algorithm 2 ASH-B: Activation Shaping by Binarizing Algorithm 3 ASH-S: Activation Shaping with Scaling |
| Open Source Code | Yes | Video, animation and code can be found at: https://andrijazz.github.io/ash. Code to reproduce results is submitted alongside this appendix. |
| Open Datasets | Yes | For CIFAR-10 and CIFAR-100 experiments, we used the 6 OOD datasets adopted in DICE (Sun & Li, 2022): SVHN (Netzer et al., 2011), LSUN-Crop (Yu et al., 2015), LSUNResize (Yu et al., 2015), i SUN (Xu et al., 2015), Places365 (Zhou et al., 2017) and Textures (Cimpoi et al., 2014), while the ID dataset is the respective CIFAR. For Image Net experiments, we inherit the exact setup from Re Act (Sun et al., 2021), where the ID dataset is Image Net-1k, and OOD datasets include i Naturalist (Van Horn et al., 2018), SUN (Xiao et al., 2010), Places365 (Zhou et al., 2017), and Textures (Cimpoi et al., 2014). |
| Dataset Splits | Yes | For CIFAR-10 and CIFAR-100 experiments, we used the 6 OOD datasets adopted in DICE (Sun & Li, 2022): SVHN (Netzer et al., 2011), LSUN-Crop (Yu et al., 2015), LSUNResize (Yu et al., 2015), i SUN (Xu et al., 2015), Places365 (Zhou et al., 2017) and Textures (Cimpoi et al., 2014), while the ID dataset is the respective CIFAR. For Image Net experiments, we inherit the exact setup from Re Act (Sun et al., 2021), where the ID dataset is Image Net-1k, and OOD datasets include i Naturalist (Van Horn et al., 2018), SUN (Xiao et al., 2010), Places365 (Zhou et al., 2017), and Textures (Cimpoi et al., 2014). For Image Net, the best performing ASH versions are ASH-B with p = 65, and ASH-S with p = 90. For CIFAR-10 and CIFAR-100, the best performing ASH versions are ASH-S with p = 95 and p = 90, and comparably, ASH-B with p = 95 and p = 85. When experimenting with a wider range and granular pruning levels, as shown in Figure 3, we observe that ASH-S and ASH-P do preserve accuracy all the way, until a rather high value of pruning (e.g. ID accuracy dropped to 64.976% at 99% of pruning). |
| Hardware Specification | Yes | All experiments are done with NVIDIA GTX1080Ti GPUs. |
| Software Dependencies | No | The paper states that "Code to reproduce results is submitted alongside this appendix" but does not specify particular software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The p parameter ASH algorithms come with only one parameter, p: the pruning percentage. In experiments we vary p from 60 to 90 and have observed relatively steady performances (see Figure 2). When studying its effect on ID accuracy degradation, we cover the entire range from 0 to 100 (see Figure 3). The SOTA performances are given by surprisingly high values of p. For Image Net, the best performing ASH versions are ASH-B with p = 65, and ASH-S with p = 90. For CIFAR-10 and CIFAR-100, the best performing ASH versions are ASH-S with p = 95 and p = 90, and comparably, ASH-B with p = 95 and p = 85. See Section F in Appendix for full details on the parameter choice. For the reimplementation of DICE with Mobile Net, we used a DICE pruning threshold of 70%. For reimplementing DICE + Re Act on Mobile Net, since DICE and Re Act each come with their own hyperparameter, we tried a grid search, where DICE pruning thresholds include {10%, 15%, 70%} and Re Act clipping thresholds {1.0, 1.5, 1.33}. Re Act + ASH is implemented with ASH-S @ p = 90 and Re Act clippign threshold of 1.0 for all of CIFAR-10, CIFAR-100, and Image Net. |