reproducibilityindex.ai

On Calibration of Modern Neural Networks

Authors: Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors inﬂuencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classiﬁcation datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling a singleparameter variant of Platt Scaling is surprisingly effective at calibrating predictions.
Researcher Affiliation	Academia	1Cornell University. Correspondence to: Chuan Guo <cg563@cornell.edu>, Geoff Pleiss <geoff@cs.cornell.edu>, Yu Sun <ys646@cornell.edu>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	For image classiﬁcation we use 6 datasets: 1. Caltech-UCSD Birds (Welinder et al., 2010):... 2. Stanford Cars (Krause et al., 2013):... 3. Image Net 2012 (Deng et al., 2009):... 4. CIFAR-10/CIFAR-100 (Krizhevsky & Hinton, 2009):... 5. Street View House Numbers (SVHN) (Netzer et al., 2011):... For document classiﬁcation we experiment with 4 datasets: 1. 20 News:... 2. Reuters:... 3. Stanford Sentiment Treebank (SST) (Socher et al., 2013):...
Dataset Splits	Yes	1. Caltech-UCSD Birds (Welinder et al., 2010): 200 bird species. 5994/2897/2897 images for train/validation/test sets. 2. Stanford Cars (Krause et al., 2013):... 8041/4020/4020 images for train/validation/test. 3. Image Net 2012 (Deng et al., 2009):... 1.3 million/25,000/25,000 images for train/validation/test. 4. CIFAR-10/CIFAR-100 (Krizhevsky & Hinton, 2009):... 45,000/5,000/10,000 images for train/validation/test. 5. Street View House Numbers (SVHN) (Netzer et al., 2011):... 604,388/6,000/26,032 images for train/validation/test. 20 News: ...9034/2259/7528 documents for train/validation/test.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper mentions 'Torch7' and 'authors code' but does not provide specific version numbers for software dependencies needed for replication.
Experiment Setup	No	We use the data preprocessing, training procedures, and hyperparameters as described in each paper. These networks obtain competitive accuracy using the optimization hyperparameters suggested by the original paper. On SST, we train Tree LSTMs (Long Short Term Memory) (Tai et al., 2015) using the default settings in the authors code.