Mitigating Neural Network Overconfidence with Logit Normalization

Authors: Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, Yixuan Li

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of Logit Norm, reducing the average FPR95 by up to 42.30% on common benchmarks.
Researcher Affiliation Academia 1Nanyang Technological University, Singapore 2Nanjing University, Nanjing, Jiangsu, China 3Chongqing University, Chongqing, China 4University of Wisconsin-Madison, Wisconsin, United States.
Pseudocode No The paper does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code and data are publicly available at https://github.com/hongxin001/logitnorm_ood.
Open Datasets Yes In this work, we use the CIFAR-10 and CIFAR-100 (Krizhevsky et al., 2009) datasets as in-distribution datasets, which are common benchmarks for OOD detection. ... For the OOD detection evaluation, we use six common benchmarks as OOD test datasets Dtest out : Textures (Cimpoi et al., 2014), SVHN (Netzer et al., 2011), Places365 (Zhou et al., 2017), LSUN-Crop (Yu et al., 2015), LSUN-Resize (Yu et al., 2015), and i SUN (Xu et al., 2015).
Dataset Splits Yes Specifically, we use the standard split with 50,000 training images and 10,000 test images. ... For hyperparameter tuning, we use Gaussian noises as the validation set.
Hardware Specification Yes We conduct all the experiments on NVIDIA GeForce RTX 3090 and implement all methods with default parameters using PyTorch (Paszke et al., 2019).
Software Dependencies No The paper mentions using 'PyTorch (Paszke et al., 2019)' but does not specify a precise version number for PyTorch or any other software library.
Experiment Setup Yes The network is trained for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, a dropout rate of 0.3, and a batch size of 128. We set the initial learning rate as 0.1 and reduce it by a factor of 10 at 80 and 140 epochs. The hyperparameter τ is selected from the range {0.001, 0.005, 0.01, . . . , 0.05}. We set 0.04 for CIFAR-10 by default.