Variable-Based Calibration for Machine Learning Classifiers

Authors: Markelle Kelly, Padhraic Smyth

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. In this paper, we systematically investigate variable-based calibration for classification models, from both theoretical and empirical perspectives.
Researcher Affiliation Academia University of California, Irvine kmarke@uci.edu, smyth@ics.uci.edu
Pseudocode No No structured pseudocode or algorithm blocks are present in the main text of the paper.
Open Source Code Yes Our code is available online at https://github.com/markellekelly/variable-wise-calibration.
Open Datasets Yes https://www.kaggle.com/sulianova/cardiovascular-diseasedataset (Cardiovascular Disease dataset), https://archive.ics.uci.edu/ml/datasets/adult (Adult Income dataset), https://www.yelp.com/dataset (Yelp review dataset), https://archive.ics.uci.edu/ml/datasets/bank+marketing (Bank Marketing dataset). Also, CIFAR-10H, a 10-class image dataset including labels and reaction times from human annotators (Peterson et al. 2019).
Dataset Splits Yes This dataset consists of 70,000 records of patient data (49,000 train, 6,000 validation, 15,000 test), with a binary prediction task of determining the presence of cardiovascular disease.
Hardware Specification No No specific hardware details (such as GPU or CPU models, processor types, or memory specifications) used for running the experiments are mentioned in the paper. The paper states 'Further details regarding datasets, models, and calibration methods are in Appendix B', but this does not cover hardware.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names with version numbers like Python 3.8, PyTorch 1.9, or CPLEX 12.4). It mentions models like BERT and Dense Net, but no detailed software environment for reproducibility.
Experiment Setup Yes The datasets are split into training, calibration, and test sets. Each calibration method is trained on the same calibration set, and all metrics and figures are produced from the final test set. The ECE and VECE are computed with an equal-support binning scheme, with B = 10. We alter the method to train decision trees for y with only v as input, with a minimum leaf size of one-tenth of the total calibration set size. We then perform beta calibration at each leaf.