reproducibilityindex.ai

Variable-Based Calibration for Machine Learning Classifiers

Authors: Markelle Kelly, Padhraic Smyth

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. In this paper, we systematically investigate variable-based calibration for classification models, from both theoretical and empirical perspectives.
Researcher Affiliation	Academia	University of California, Irvine kmarke@uci.edu, smyth@ics.uci.edu
Pseudocode	No	No structured pseudocode or algorithm blocks are present in the main text of the paper.
Open Source Code	Yes	Our code is available online at https://github.com/markellekelly/variable-wise-calibration.
Open Datasets	Yes	https://www.kaggle.com/sulianova/cardiovascular-diseasedataset (Cardiovascular Disease dataset), https://archive.ics.uci.edu/ml/datasets/adult (Adult Income dataset), https://www.yelp.com/dataset (Yelp review dataset), https://archive.ics.uci.edu/ml/datasets/bank+marketing (Bank Marketing dataset). Also, CIFAR-10H, a 10-class image dataset including labels and reaction times from human annotators (Peterson et al. 2019).
Dataset Splits	Yes	This dataset consists of 70,000 records of patient data (49,000 train, 6,000 validation, 15,000 test), with a binary prediction task of determining the presence of cardiovascular disease.
Hardware Specification	No	No specific hardware details (such as GPU or CPU models, processor types, or memory specifications) used for running the experiments are mentioned in the paper. The paper states 'Further details regarding datasets, models, and calibration methods are in Appendix B', but this does not cover hardware.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library names with version numbers like Python 3.8, PyTorch 1.9, or CPLEX 12.4). It mentions models like BERT and Dense Net, but no detailed software environment for reproducibility.
Experiment Setup	Yes	The datasets are split into training, calibration, and test sets. Each calibration method is trained on the same calibration set, and all metrics and figures are produced from the final test set. The ECE and VECE are computed with an equal-support binning scheme, with B = 10. We alter the method to train decision trees for y with only v as input, with a minimum leaf size of one-tenth of the total calibration set size. We then perform beta calibration at each leaf.