Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks

Authors: Amir Rahimi, Amirreza Shaban, Ching-An Cheng, Richard Hartley, Byron Boots

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the effectiveness of the proposed method across a wide range of datasets and classifiers. Our method outperforms state-of-the-art post-hoc calibration methods, namely temperature scaling and Dirichlet calibration, in several evaluation metrics for the task. ... We evaluate the performance of intra order-preserving (OP), order-invariant intra order-preserving (OI), and diagonal intra order-preserving (DIAG) families in calibrating the output of various im-age classification deep networks and compare their results with the previous post-hoc calibration techniques. ... Table 1 summarizes the results of our calibration methods and other baselines in terms of ECE and presents the average relative error with respect to the uncalibrated model.
Researcher Affiliation Collaboration Amir Rahimi ANU, ACRV amir.rahimi@anu.edu.au Amirreza Shaban Georgia Tech ashaban@uw.edu Ching-An Cheng Microsoft Research chinganc@microsoft.com Richard Hartley Google Research, ANU, ACRV richard.hartley@anu.edu.au Byron Boots University of Washington bboots@cs.washington.edu
Pseudocode No The paper includes a flow graph (Figure 3) but does not contain explicit pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper refers to the 'official implementation' of a baseline method [14] and a proposed architecture from another paper [31], but does not provide specific access to its own source code for the intra order-preserving functions.
Open Datasets Yes We use six different datasets: CIFAR-{10,100} [13], SVHN [24], CARS [12], BIRDS [32], and Image Net [4].
Dataset Splits Yes We follow the experiment protocol in [14, 16] and use cross validation on the calibration dataset to find the best hyperparameters and architectures for all the methods.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies No The paper uses deep learning frameworks implicitly but does not provide specific software dependencies with version numbers.
Experiment Setup No The paper states that 'cross validation on the calibration dataset to find the best hyperparameters and architectures' was used and mentions the negative log likelihood (NLL) loss with a regularization weight λ, but it does not provide specific values for these hyperparameters or other training configurations like learning rate or batch size.