Calibrating Deep Neural Networks using Focal Loss
Authors: Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip Torr, Puneet Dokania
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases. |
| Researcher Affiliation | Collaboration | Jishnu Mukhoti University of Oxford Five AI Ltd. Viveka Kulharia University of Oxford Amartya Sanyal University of Oxford The Alan Turing Institute Stuart Golodetz Five AI Ltd. Philip H. S. Torr University of Oxford Five AI Ltd. Puneet K. Dokania University of Oxford Five AI Ltd. |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/torrvision/ focal_calibration. |
| Open Datasets | Yes | We conduct image and document classification experiments to test the performance of focal loss. For the former, we use CIFAR-10/100 [13] and Tiny-Image Net [6] , and train Res Net-50, Res Net-110 [8], Wide-Res Net-26-10 [42] and Dense Net-121 [10] models, and for the latter, we use 20 Newsgroups [17] and Stanford Sentiment Treebank (SST) [32] datasets and train Global Pooling CNN [18] and Tree-LSTM [33] models. |
| Dataset Splits | Yes | Temperature Scaling: In order to compute the optimal temperature, we use two different methods: (a) learning the temperature by minimising val set NLL, and (b) performing grid search over tempera- tures between 0 and 10, with a step of 0.1, and finding the one that minimises val set ECE. ... For fair comparison, we chose 3 intermediate models for each loss function with the best val set ECE, NLL and accuracy... |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1'). It only vaguely refers to 'PyTorch' in a reference but without specifying the version for the work's direct dependencies. |
| Experiment Setup | Yes | For the analysis, we train a Res Net-50 network on CIFAR-10 with state-of-the-art performance settings [31]. We use Stochastic Gradient Descent (SGD) with a mini-batch of size 128, momentum of 0.9, and learning rate schedule of {0.1, 0.01, 0.001} for the first 150, next 100, and last 100 epochs, respectively. |