Soft Calibration Objectives for Neural Networks
Authors: Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we propose differentiable losses to improve calibration based on a soft (continuous) version of the binning operation underlying popular calibration-error estimators. When incorporated into training, these soft calibration losses achieve state-of-the-art single-model ECE across multiple datasets with less than 1% decrease in accuracy. For instance, we observe an 82% reduction in ECE (70% relative to the post-hoc rescaled ECE) in exchange for a 0.7% relative decrease in accuracy relative to the cross-entropy baseline on CIFAR-100. Overall, experiments across losses and datasets demonstrate that using calibration-sensitive procedures yield better uncertainty estimates under dataset shift than the standard practice of using a cross-entropy loss and post-hoc recalibration methods. |
| Researcher Affiliation | Industry | Archit Karandikar Google Research archk@google.com Nicholas Cain Google Research nicholascain@google.com Dustin Tran Google Research trandustin@google.com Balaji Lakshminarayanan Google Research balajiln@google.com Jonathon Shlens Google Research shlens@google.com Michael C. Mozer Google Research mcmozer@google.com Becca Roelofs Google Research rolfs@google.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available on Git Hub: https://github.com/google/uncertainty-baselines/tree/main/experimental/caltrain |
| Open Datasets | Yes | We compare our Soft Calibration Objectives to recently proposed calibration-incentivizing training objectives MMCE, focal loss, and Av UC on the CIFAR-10, CIFAR-100, and Image Net datasets. |
| Dataset Splits | No | The paper mentions tuning hyperparameters which typically implies a validation set, but it does not explicitly state the specific train/validation/test dataset splits (e.g., percentages, counts, or predefined split citations) used for *their* experiments to allow reproduction of data partitioning. It mentions related work uses a validation set but not for its own setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper states: |
| Experiment Setup | Yes | We use the Wide-Resnet-28-10 architecture [Zagoruyko and Komodakis, 2017] trained for 200 epochs on CIFAR-100 and CIFAR-10. For Imagenet, we use the Resnet-50 [He et al., 2015] architecture training for 90 epochs. All our experiments use the SGD with momentum optimizer with momentum fixed to 0.9 and learning rate fixed to 0.1. The loss function we use in our experiments is PL + β SL + λ L2 where PL and SL denote the primary and secondary losses respectively and L2 denotes the weight normalization term with ℓ2 norm. We tune the β and λ parameters along with the parameters κ and T relevant to the secondary losses SB-ECElb,p(M, T, ˆD, θ) and S-Av UC(κ, T, ˆD, θ). We tune these hyperparameters sequentially. We fix the learning rate schedule and the number of bins M to keep the search space manageable. Appendix G has more details of our hyperparameter search. |