Conformal Prediction with Missing Values

Authors: Margaux Zaffran, Aymeric Dieuleveut, Julie Josse, Yaniv Romano

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using synthetic and data from critical care, we corroborate our theory and report improved performance of our methods.
Researcher Affiliation Collaboration 1Electricit e de France R&D, Palaiseau, France 2Pre Me DICa L project team, INRIA Sophia-Antipolis, Montpellier, France 3CMAP, Ecole polytechnique, Institut Polytechnique de Paris, Palaiseau, France 4Departments of Electrical Engineering and of Computer Science, Technion Israel Institute of Technology, Haifa, Israel.
Pseudocode Yes Algorithm 1 CP-MDA-Exact (with CQR), Algorithm 2 CP-MDA-Nested (with CQR), Algorithm 3 SCP on impute-then-predict, Algorithm 4 CP-MDA-Exact
Open Source Code Yes The code to reproduce our experiments is available on Git Hub.
Open Datasets Yes We consider 6 benchmark real data sets for regression: meps_19, meps_20, meps_21 (MEPS), bio, bike and concrete (Dua & Graff, 2017)..., MEPS. Medical expenditure panel survey. https://meps.ahrq.gov/mepsweb/data_stats/data_overview.jsp
Dataset Splits Yes Split CP (Papadopoulos et al., 2002; Lei et al., 2018) achieves Eq. (1) by keeping a hold-out set, the calibration set, used to evaluate the performance of a fixed predictive model. and The calibration size is fixed to 1000 and the test set contains 2000 points.... Also, Table 1 specifies 'Tr size' and 'Cal size' values.
Hardware Specification No The paper mentions training models like Neural Networks and using Scikit-learn for iterative regression but does not specify any particular hardware components such as GPU models, CPU models, or cloud computing resources used for the experiments.
Software Dependencies No The paper mentions 'iterative regression (iterative ridge implemented in Scikit-learn, Pedregosa et al. (2011))' and 'Neural Network (NN)' optimized using 'Adam Kingma & Ba (2014)', but it does not provide specific version numbers for Scikit-learn, PyTorch/TensorFlow (if used for NN), Python, or any other software dependencies.
Experiment Setup Yes The network is composed of three fully connected layers with a hidden dimension of 64, and Re LU activation functions. We use the pinball loss to estimate the conditional quantiles, with a dropout regularization of rate 0.1. The network is optimized using Adam Kingma & Ba (2014) with a learning rate equal to 0.0005. We tune the optimal number of epochs by cross validation, minimizing the loss function on the hold-out data points; the maximal number of epochs is set to 2000.