reproducibilityindex.ai

How Flawed Is ECE? An Analysis via Logit Smoothing

Authors: Muthu Chidambaram, Holden Lee, Colin Mcswiggen, Semon Rezchikov

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice. Lastly, in Section 6, we verify empirically that LS-ECE is continuous even when ECE is not, and also show that for the standard image classification benchmarks of CIFAR-10, CIFAR-100, and Image Net, both ECE and LS-ECE produce near identical results across various models indicating that the theoretical pathologies of ECE may not pose an issue in practice.
Researcher Affiliation	Academia	1Department of Computer Science, Duke University 2Johns Hopkins University 3New York University 4Princeton University.
Pseudocode	Yes	Figure 1. Implementation of LS-ECEˆπ,ξ(h) in 10 lines of PyTorch (Paszke et al., 2019) using broadcast semantics.
Open Source Code	Yes	All of the code used to generate the plots in this section can be found at: https://github.com/2014mchidamb/how-flawed-is-ece.
Open Datasets	Yes	Lastly, in Section 6, we verify empirically that LS-ECE is continuous even when ECE is not, and also show that for the standard image classification benchmarks of CIFAR-10, CIFAR-100, and Image Net, both ECE and LS-ECE produce near identical results across various models indicating that the theoretical pathologies of ECE may not pose an issue in practice.
Dataset Splits	No	The paper mentions using 'Image Net-1K validation data' but does not specify the exact percentages or absolute sample counts for the splits, nor does it cite a specific predefined split with authors and year.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions 'PyTorch' as a tool for implementation in Figure 1 but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	We let σ be the inverse of the number of bins used for ECEBIN,π(g). ... We estimate LS-ECEˆπ,ξ(h) via 10000 independent samples drawn from the distribution of h(X)+ξ... We consider using uniform noise in Appendix D... For our CIFAR experiments, we use pretrained versions (due to Yaofo Chen)...