reproducibilityindex.ai

Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks

Authors: Amir Rahimi, Amirreza Shaban, Ching-An Cheng, Richard Hartley, Byron Boots

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the effectiveness of the proposed method across a wide range of datasets and classiﬁers. Our method outperforms state-of-the-art post-hoc calibration methods, namely temperature scaling and Dirichlet calibration, in several evaluation metrics for the task. ... We evaluate the performance of intra order-preserving (OP), order-invariant intra order-preserving (OI), and diagonal intra order-preserving (DIAG) families in calibrating the output of various im-age classiﬁcation deep networks and compare their results with the previous post-hoc calibration techniques. ... Table 1 summarizes the results of our calibration methods and other baselines in terms of ECE and presents the average relative error with respect to the uncalibrated model.
Researcher Affiliation	Collaboration	Amir Rahimi ANU, ACRV amir.rahimi@anu.edu.au Amirreza Shaban Georgia Tech ashaban@uw.edu Ching-An Cheng Microsoft Research chinganc@microsoft.com Richard Hartley Google Research, ANU, ACRV richard.hartley@anu.edu.au Byron Boots University of Washington bboots@cs.washington.edu
Pseudocode	No	The paper includes a flow graph (Figure 3) but does not contain explicit pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper refers to the 'ofﬁcial implementation' of a baseline method [14] and a proposed architecture from another paper [31], but does not provide specific access to its own source code for the intra order-preserving functions.
Open Datasets	Yes	We use six different datasets: CIFAR-{10,100} [13], SVHN [24], CARS [12], BIRDS [32], and Image Net [4].
Dataset Splits	Yes	We follow the experiment protocol in [14, 16] and use cross validation on the calibration dataset to ﬁnd the best hyperparameters and architectures for all the methods.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies	No	The paper uses deep learning frameworks implicitly but does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper states that 'cross validation on the calibration dataset to ﬁnd the best hyperparameters and architectures' was used and mentions the negative log likelihood (NLL) loss with a regularization weight λ, but it does not provide specific values for these hyperparameters or other training configurations like learning rate or batch size.