reproducibilityindex.ai

Conformal Prediction via Regression-as-Classification

Authors: Etash Kumar Guha, Shlok Natarajan, Thomas Möllenhoff, Mohammad Emtiyaz Khan, Eugene Ndiaye

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on many benchmarks shows that this simple approach gives surprisingly good results on many practical problems. We investigate the empirical behavior of our R2CCP (Regression-to-Classification Conformal Prediction) method, which we have explained in detail in Algorithm 1. We have three sets of experiments.
Researcher Affiliation	Collaboration	Etash Guha RIKEN Center for AI Project, Samba Nova Systems {etash.guha}@sambanovasystems.com Shlok Natarajan Salesforce {shloknatarajan}@salesforce.com Thomas M ollenhoff RIKEN Center for AI Project {thomas.moellenhoff}@riken.jp Mohammad Emtiyaz Khan RIKEN Center for AI Project {emtiyaz.khan}@riken.jp Eugene Ndiaye Apple {e ndiaye}@apple.com
Pseudocode	Yes	Algorithm 1 Regression to Classication Conformal Prediction (R2CCP).
Open Source Code	Yes	All code to run our method can be installed via pip install r2ccp.
Open Datasets	Yes	Specifically, these are several datasets from the UCI Machine Learning repository (Bio, Blog, Concrete, Community, Energy, Forest, Stock, Cancer, Solar, Parkinsons, Pendulum) (Nottingham et al., 2023) and the medical expenditure panel survey number 19 21 (MEPS-19 21) (Cohen et al., 2009).
Dataset Splits	No	The paper states that data is 'Randomly split the dataset Dn in training Dtr and calibration Dcal' (Algorithm 1, step 4) but does not provide specific percentages or absolute counts for these splits, nor does it refer to predefined standard splits with their ratios.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments, such as GPU/CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions 'pip install r2ccp' and 'AdamW as an optimizer' but does not specify version numbers for Python, deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software dependencies.
Experiment Setup	Yes	We do not tune the hyperparameters and keep values of K = 50, p = 0.5, and τ = 0.2 constant across all experiments. For all experiments, we report length, meaning the length of all the sets predicted, and coverage, the percent of instances where the true label is contained in the predicted intervals. ... Specifically, we discretize the range space into K = 50 points, weight the entropy term by τ = 0.2, use a 1000 hidden dimension, use 4 layers, use weight decay of 1e 4, use p = .5, and use Adam W as an optimizer. For most of the experiments, we use learning rate 1e 4 and batch size 32. However, for certain datasets, namely the MEPS datasets, we used a larger batch size of 256 to improve training time and used a smaller learning rate to prevent training divergence.