Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Authors: Jaroslaw Blasiok, Preetum Nakkiran

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We include several experiments demonstrating our method on public datasets in various domains, from deep learning to meteorology. The sample sizes vary between several hundred to 50K, to show how our method behaves for different data sizes. In each setting, we compare the classical binned reliability diagram to the smooth diagram generated by our Python package.
Researcher Affiliation Collaboration Jarosław Błasiok Columbia University Preetum Nakkiran Apple
Pseudocode Yes Algorithm 1: Efficient estimation of sm ECEσ, at fixed scale σ
Open Source Code Yes We also release a Python package with simple, hyperparameter-free methods for measuring and plotting calibration: pip install relplot . Code at: https://github.com/apple/ml-calibration.
Open Datasets Yes ResNet32 (He et al., 2016) on the ImageNet validation set (Deng et al., 2009).
Dataset Splits Yes ImageNet is an image classification task with 1000 classes, and has a validation set of 50,000 samples.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or cloud instance specifications used for running the experiments.
Software Dependencies No The paper mentions 'pip install relplot' for their Python package but does not specify exact version numbers for Python or any other software dependencies.
Experiment Setup Yes Our diagrams include kernel density estimates of the predictions (at the same kernel bandwidth σ used to compute the Smooth ECE). For binned diagrams, the number of bins is chosen to be optimal for the regression test MSE loss, optimized via cross-validation.