On Calibration of Modern Neural Networks
Authors: Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling a singleparameter variant of Platt Scaling is surprisingly effective at calibrating predictions. |
| Researcher Affiliation | Academia | 1Cornell University. Correspondence to: Chuan Guo <cg563@cornell.edu>, Geoff Pleiss <geoff@cs.cornell.edu>, Yu Sun <ys646@cornell.edu>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | For image classification we use 6 datasets: 1. Caltech-UCSD Birds (Welinder et al., 2010):... 2. Stanford Cars (Krause et al., 2013):... 3. Image Net 2012 (Deng et al., 2009):... 4. CIFAR-10/CIFAR-100 (Krizhevsky & Hinton, 2009):... 5. Street View House Numbers (SVHN) (Netzer et al., 2011):... For document classification we experiment with 4 datasets: 1. 20 News:... 2. Reuters:... 3. Stanford Sentiment Treebank (SST) (Socher et al., 2013):... |
| Dataset Splits | Yes | 1. Caltech-UCSD Birds (Welinder et al., 2010): 200 bird species. 5994/2897/2897 images for train/validation/test sets. 2. Stanford Cars (Krause et al., 2013):... 8041/4020/4020 images for train/validation/test. 3. Image Net 2012 (Deng et al., 2009):... 1.3 million/25,000/25,000 images for train/validation/test. 4. CIFAR-10/CIFAR-100 (Krizhevsky & Hinton, 2009):... 45,000/5,000/10,000 images for train/validation/test. 5. Street View House Numbers (SVHN) (Netzer et al., 2011):... 604,388/6,000/26,032 images for train/validation/test. 20 News: ...9034/2259/7528 documents for train/validation/test. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Torch7' and 'authors code' but does not provide specific version numbers for software dependencies needed for replication. |
| Experiment Setup | No | We use the data preprocessing, training procedures, and hyperparameters as described in each paper. These networks obtain competitive accuracy using the optimization hyperparameters suggested by the original paper. On SST, we train Tree LSTMs (Long Short Term Memory) (Tai et al., 2015) using the default settings in the authors code. |