reproducibilityindex.ai

Soft Calibration Objectives for Neural Networks

Authors: Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose differentiable losses to improve calibration based on a soft (continuous) version of the binning operation underlying popular calibration-error estimators. When incorporated into training, these soft calibration losses achieve state-of-the-art single-model ECE across multiple datasets with less than 1% decrease in accuracy. For instance, we observe an 82% reduction in ECE (70% relative to the post-hoc rescaled ECE) in exchange for a 0.7% relative decrease in accuracy relative to the cross-entropy baseline on CIFAR-100. Overall, experiments across losses and datasets demonstrate that using calibration-sensitive procedures yield better uncertainty estimates under dataset shift than the standard practice of using a cross-entropy loss and post-hoc recalibration methods.
Researcher Affiliation	Industry	Archit Karandikar Google Research archk@google.com Nicholas Cain Google Research nicholascain@google.com Dustin Tran Google Research trandustin@google.com Balaji Lakshminarayanan Google Research balajiln@google.com Jonathon Shlens Google Research shlens@google.com Michael C. Mozer Google Research mcmozer@google.com Becca Roelofs Google Research rolfs@google.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Code available on Git Hub: https://github.com/google/uncertainty-baselines/tree/main/experimental/caltrain
Open Datasets	Yes	We compare our Soft Calibration Objectives to recently proposed calibration-incentivizing training objectives MMCE, focal loss, and Av UC on the CIFAR-10, CIFAR-100, and Image Net datasets.
Dataset Splits	No	The paper mentions tuning hyperparameters which typically implies a validation set, but it does not explicitly state the specific train/validation/test dataset splits (e.g., percentages, counts, or predefined split citations) used for their experiments to allow reproduction of data partitioning. It mentions related work uses a validation set but not for its own setup.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper states:
Experiment Setup	Yes	We use the Wide-Resnet-28-10 architecture [Zagoruyko and Komodakis, 2017] trained for 200 epochs on CIFAR-100 and CIFAR-10. For Imagenet, we use the Resnet-50 [He et al., 2015] architecture training for 90 epochs. All our experiments use the SGD with momentum optimizer with momentum ﬁxed to 0.9 and learning rate ﬁxed to 0.1. The loss function we use in our experiments is PL + β SL + λ L2 where PL and SL denote the primary and secondary losses respectively and L2 denotes the weight normalization term with ℓ2 norm. We tune the β and λ parameters along with the parameters κ and T relevant to the secondary losses SB-ECElb,p(M, T, ˆD, θ) and S-Av UC(κ, T, ˆD, θ). We tune these hyperparameters sequentially. We ﬁx the learning rate schedule and the number of bins M to keep the search space manageable. Appendix G has more details of our hyperparameter search.