Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing
Authors: Jaroslaw Blasiok, Preetum Nakkiran
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We include several experiments demonstrating our method on public datasets in various domains, from deep learning to meteorology. The sample sizes vary between several hundred to 50K, to show how our method behaves for different data sizes. In each setting, we compare the classical binned reliability diagram to the smooth diagram generated by our Python package. |
| Researcher Affiliation | Collaboration | Jarosław Błasiok Columbia University Preetum Nakkiran Apple |
| Pseudocode | Yes | Algorithm 1: Efficient estimation of sm ECEσ, at fixed scale σ |
| Open Source Code | Yes | We also release a Python package with simple, hyperparameter-free methods for measuring and plotting calibration: pip install relplot . Code at: https://github.com/apple/ml-calibration. |
| Open Datasets | Yes | ResNet32 (He et al., 2016) on the ImageNet validation set (Deng et al., 2009). |
| Dataset Splits | Yes | ImageNet is an image classification task with 1000 classes, and has a validation set of 50,000 samples. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or cloud instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'pip install relplot' for their Python package but does not specify exact version numbers for Python or any other software dependencies. |
| Experiment Setup | Yes | Our diagrams include kernel density estimates of the predictions (at the same kernel bandwidth σ used to compute the Smooth ECE). For binned diagrams, the number of bins is chosen to be optimal for the regression test MSE loss, optimized via cross-validation. |