Mitigating Neural Network Overconfidence with Logit Normalization
Authors: Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, Yixuan Li
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of Logit Norm, reducing the average FPR95 by up to 42.30% on common benchmarks. |
| Researcher Affiliation | Academia | 1Nanyang Technological University, Singapore 2Nanjing University, Nanjing, Jiangsu, China 3Chongqing University, Chongqing, China 4University of Wisconsin-Madison, Wisconsin, United States. |
| Pseudocode | No | The paper does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code and data are publicly available at https://github.com/hongxin001/logitnorm_ood. |
| Open Datasets | Yes | In this work, we use the CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) datasets as in-distribution datasets, which are common benchmarks for OOD detection. ... For the OOD detection evaluation, we use six common benchmarks as OOD test datasets Dtest out : Textures (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), Places365 (Zhou et al., 2017), LSUN-Crop (Yu et al., 2015), LSUN-Resize (Yu et al., 2015), and i SUN (Xu et al., 2015). |
| Dataset Splits | Yes | Specifically, we use the standard split with 50,000 training images and 10,000 test images. ... For hyperparameter tuning, we use Gaussian noises as the validation set. |
| Hardware Specification | Yes | We conduct all the experiments on NVIDIA GeForce RTX 3090 and implement all methods with default parameters using PyTorch (Paszke et al., 2019). |
| Software Dependencies | No | The paper mentions using 'PyTorch (Paszke et al., 2019)' but does not specify a precise version number for PyTorch or any other software library. |
| Experiment Setup | Yes | The network is trained for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, a dropout rate of 0.3, and a batch size of 128. We set the initial learning rate as 0.1 and reduce it by a factor of 10 at 80 and 140 epochs. The hyperparameter τ is selected from the range {0.001, 0.005, 0.01, . . . , 0.05}. We set 0.04 for CIFAR-10 by default. |