Conformal Prediction using Conditional Histograms

Authors: Matteo Sesia, Yaniv Romano

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments with simulated and real data demonstrate improved performance compared to state-of-the-art alternatives, including conformalized quantile regression and other distributional conformal prediction approaches.
Researcher Affiliation Academia Matteo Sesia Department of Data Sciences and Operations University of Southern California, USA sesia@marshall.usc.edu Yaniv Romano Departments of Electrical and Computer Engineering and of Computer Science Technion, Israel yromano@technion.ac.il
Pseudocode Yes Algorithm 1: CHR with split-conformal calibration
Open Source Code Yes A Python implementation of CHR is available online at https://github.com/msesia/chr, along with code to reproduce the following numerical experiments.
Open Datasets Yes We apply CHR to the following seven public-domain data sets also considered in [30]: physicochemical properties of protein tertiary structure (bio) [6], blog feedback (blog) [1], Facebook comment volume [2], variants one (fb1) and two (fb2), from the UCI Machine Learning Repository [15]; and medical expenditure panel survey number 19 (meps19) [3], number 20 (meps20) [4], and number 21 (meps21) [5], from [13].
Dataset Splits Yes For simplicity, we apply CHR and other benchmark methods by assigning equal numbers of samples to the training and calibration sets; this ensures all comparisons are fair, although different options may lead to even shorter intervals [32]. ... In each experiment, 2000 samples are used for training, 2000 for calibration, and the remaining ones for training.
Hardware Specification No The paper states: "M.S. thanks the center for Advanced Research Computing at the University of Southern California for providing computing resources.", but it does not specify any particular hardware (e.g., GPU/CPU models, memory, or cluster specifications) used for the experiments.
Software Dependencies No The paper mentions: "A Python implementation of CHR is available online at https://github.com/msesia/chr" and refers to "black-box quantile regression models" and "Bayesian additive regression trees", but it does not provide specific version numbers for Python or any of the libraries/frameworks used.
Experiment Setup Yes A Python implementation of CHR is available online at https://github.com/msesia/chr, along with code to reproduce the following numerical experiments. This software divides the domain of Y into a desired number of bins with equal sizes, depending on the range of values observed in the training data; we use 100 bins for the synthetic data and 1000 for the real data. Then, we estimate the conditional histograms ˆπ using different black-box quantile regression models [27, 33], with a grid of quantiles ranging from 1% to 99%; see Supplementary Section S1.1. Our software also supports Bayesian additive regression trees [11] and could easily accommodate other alternatives. For simplicity, we apply CHR and other benchmark methods by assigning equal numbers of samples to the training and calibration sets; this ensures all comparisons are fair, although different options may lead to even shorter intervals [32]. ... In each experiment, 2000 samples are used for training, 2000 for calibration, and the remaining ones for training. All features are standardized to have zero mean and unit variance. The nominal coverage rate is 90%.