Predictive Inference with Feature Conformal Prediction

Authors: Jiaye Teng, Chuan Wen, Dinghuai Zhang, Yoshua Bengio, Yang Gao, Yang Yuan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Apart from experiments on existing predictive inference benchmarks, we also demonstrate the state-of-the-art performance of the proposed methods on large-scale tasks such as Image Net classification and Cityscapes image segmentation.We conduct experiments on synthetic and real-world datasets, mainly to show that Feature CP is (a) effective, i.e., it could return valid confidence bands with empirical coverage larger than 1 α; (b) efficient, i.e., it could return shorter confidence bands than vanilla CP.
Researcher Affiliation Academia 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Mila Quebec AI Institute 3Shanghai Artificial Intelligence Laboratory 4Shanghai Qi Zhi Institute
Pseudocode Yes Algorithm 1 Conformal Prediction, Algorithm 2 Non-conformity Score, Algorithm 3 Feature Conformal Prediction
Open Source Code Yes The code is available in https://github.com/Alvin Wen428/Feature CP.
Open Datasets Yes We consider both synthetic datasets and real-world datasets, including (a) realistic unidimensional target datasets: five datasets from UCI machine learning repository (Asuncion, 2007): physicochemical properties of protein tertiary structure (bio), bike sharing (bike), community and crimes (community) and Facebook comment volume variants one and two (facebook 1/2), five datasets from other sources: blog feedback (blog) (Buza, 2014), Tennessee s student teacher achievement ratio (star) (Achilles et al., 2008), and medical expenditure panel survey (meps19 21) (Cohen et al., 2009); (c) real-world semantic segmentation dataset: Cityscapes (Cordts et al., 2016), where we transform the original pixel-wise classification problem into a high-dimensional pixel-wise regression problem. We also extend Feature CP to classification problems and test on the Image Net (Deng et al., 2009) dataset.
Dataset Splits Yes In the unidimensional and synthetic dimensional target regression experiments, we randomly divide the dataset into training, calibration, and test sets with the proportion 2 : 2 : 1. As for the semantic segmentation experiment, because the labels of the pre-divided test set are not accessible, we re-split the training, calibration, and test sets randomly on the original training set of Cityscapes. During calibration, to get the best value for the number of steps M, we take a subset (one-fifth) of the calibration set as the additional validation set.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. It only mentions 'large neural networks' without further hardware specifications.
Software Dependencies No The paper mentions 'Py Torch' as part of the model architecture description ('follows the official implementation of Py Torch') but does not specify its version number or any other software dependencies with version information.
Experiment Setup Yes In the unidimensional and synthetic dimensional target regression experiments, we randomly divide the dataset into training, calibration, and test sets with the proportion 2 : 2 : 1. As for the semantic segmentation experiment, because the labels of the pre-divided test set are not accessible, we re-split the training, calibration, and test sets randomly on the original training set of Cityscapes. We remove the class 0 (unlabeled) from the labels during calibration and testing, and use the weighted mean square error as the training objective where the class weights are adopted from Paszke et al. (2016). In estimating the band length, we deploy Band Estimation based on the score calculated by Algorithm 2. In our experiment, we choose the number of steps M in Algorithm 2 via cross-validation.