reproducibilityindex.ai

A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition

Authors: Junjiao Tian, Dylan Yung, Yen-Chang Hsu, Zsolt Kira

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments
Researcher Affiliation	Collaboration	Junjiao Tian Georgia Institute of Technology jtian73@gatech.edu Dylan Yung Georgia Institute of Technology dyung6@gatech.edu Yen-Chang Hsu Samsung Research America yenchang.hsu@samsung.com Zsolt Kira Georgia Institute of Technology zkira@gatech.edu
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	Yes	Code available at https: //github.com/GT-RIPL/Geometric-Sensitivity-Decomposition.git.
Open Datasets	Yes	Following prior works [9, 8, 5], we will use CIFAR10 and CIFAR100 as the in-distribution training and testing dataset, and apply the image corruption library provided by [1] to benchmark calibration performance under distribution shift.
Dataset Splits	Yes	The ﬁrst step is calibrating the model on IND validation set (note our method does not rely on OOD validation data), similar to temperature calibration [4]. However, instead of tuning a temperature parameter as shown in Fig. 1a, we simply tune the offset parameter β on the validation set in one of two ways: 1) grid-search based on minimizing Expected Calibration Error (see Sec. 4) 2) SGD optimization based on Negative Log Likelihood [4].
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided in the paper.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	The new model can be trained using the same training procedures as the vanilla network without additional hyperparameter tuning, changing the architecture or extended training time. We regularize α such that the instance-independent component Cφ is small. Speciﬁcally, we penalize α 1 2 2 because α = cos Cφ, i.e., if α 1, Cφ 0. We empirically found that a larger relaxation angle Cφ deteriorates performance because the angular similarity already correlates well with difﬁculty of data [11] and we do not need to encourage a large relaxation. Sec. 4.3 will empirically verify this. We simply tune the offset parameter β on the validation set in one of two ways: 1) grid-search based on minimizing Expected Calibration Error (see Sec. 4) 2) SGD optimization based on Negative Log Likelihood [4]. Because these are post-training procedure, both methods are very efﬁcient. We denote the new parameter as β . For β Optimized, it states: "optimize β on the validation set via gradient decent to minimize NLL for 10 epochs". It also states "c is a hyperparameter which can be calculated as in Eq. 10. The non-linear function grows exponentially close to the calibrated afﬁne mapping in Eq. 8 dictated by 1 e c x 2 as shown in 1c. Therefore, e c x 2 can be viewed as an error term that quantiﬁes how close the non-linear function is to the calibrated afﬁne function in Eq. 8. Let µx and σx denote the mean and standard deviation of the distribution of the norm of IND sample embedding calculated on the validation set. We use the heuristic that when evaluated at one standard deviation below the mean, x 2 = µx σx, the approximation error e c(µx σx) = 0.1. Even though the error threshold is a hyperparameter, using an error of 0.1 lead to state-of-the-art results across all models applied. c = ln(1 error) / (µx σx) = ln(0.9) / (µx σx)"