How Flawed Is ECE? An Analysis via Logit Smoothing

Authors: Muthu Chidambaram, Holden Lee, Colin Mcswiggen, Semon Rezchikov

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice. Lastly, in Section 6, we verify empirically that LS-ECE is continuous even when ECE is not, and also show that for the standard image classification benchmarks of CIFAR-10, CIFAR-100, and Image Net, both ECE and LS-ECE produce near identical results across various models indicating that the theoretical pathologies of ECE may not pose an issue in practice.
Researcher Affiliation Academia 1Department of Computer Science, Duke University 2Johns Hopkins University 3New York University 4Princeton University.
Pseudocode Yes Figure 1. Implementation of LS-ECEˆπ,ξ(h) in 10 lines of PyTorch (Paszke et al., 2019) using broadcast semantics.
Open Source Code Yes All of the code used to generate the plots in this section can be found at: https://github.com/2014mchidamb/how-flawed-is-ece.
Open Datasets Yes Lastly, in Section 6, we verify empirically that LS-ECE is continuous even when ECE is not, and also show that for the standard image classification benchmarks of CIFAR-10, CIFAR-100, and Image Net, both ECE and LS-ECE produce near identical results across various models indicating that the theoretical pathologies of ECE may not pose an issue in practice.
Dataset Splits No The paper mentions using 'Image Net-1K validation data' but does not specify the exact percentages or absolute sample counts for the splits, nor does it cite a specific predefined split with authors and year.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions 'PyTorch' as a tool for implementation in Figure 1 but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes We let σ be the inverse of the number of bins used for ECEBIN,π(g). ... We estimate LS-ECEˆπ,ξ(h) via 10000 independent samples drawn from the distribution of h(X)+ξ... We consider using uniform noise in Appendix D... For our CIFAR experiments, we use pretrained versions (due to Yaofo Chen)...