reproducibilityindex.ai

Non-Exchangeable Conformal Risk Control

Authors: António Farinhas, Chrysoula Zerva, Dennis Thomas Ulmer, Andre Martins

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with both synthetic and real world data show the usefulness of our method. and 4 EXPERIMENTS In this section, we turn to demonstrating the validity of our theoretical results in three different tasks using different nonincreasing losses: a multilabel classification problem using synthetic time series data, minimizing the false negative rate (4.1), a problem involving monitoring electricity usage, minimizing the λ-insensitive absolute loss (4.2), and an open-domain question answering (QA) task, where we control the best token-level F1-score (4.3).
Researcher Affiliation	Collaboration	Ant onio Farinhas 1,2, Chrysoula Zerva 1,2, Dennis Ulmer 3,4, Andr e F. T. Martins 1,2,5 1Instituto de Telecomunicac oes, 2Instituto Superior T ecnico, Universidade de Lisboa (Lisbon ELLIS Unit), 3IT University of Copenhagen, 4Pioneer Centre for Artificial Intelligence , 5Unbabel
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/deep-spin/non-exchangeable-crc.
Open Datasets	Yes	We use the ELEC2 dataset (Harries, 1999) and We use the Natural Questions dataset (Kwiatkowski et al., 2019; Karpukhin et al., 2020)
Dataset Splits	Yes	After a warmup period of 200 time points, at each time step n = 200, . . . , N 1 we assign odd indices to the training set, even indices to the calibration set, and we let Xn+1 be the test point. and We use the Natural Questions dataset (Kwiatkowski et al., 2019; Karpukhin et al., 2020), considering n = 2500 points for calibration and 1110 for evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper acknowledges the use of open-source software by citing works such as Van Rossum & Drake (2009) (Python), Oliphant (2006) (NumPy), Virtanen et al. (2020) (SciPy), Walt et al. (2011) (NumPy/SciPy), Pedregosa et al. (2011) (Scikit-learn), and Paszke et al. (2019) (PyTorch). However, it does not specify the exact version numbers for these software dependencies used in their experiments.
Experiment Setup	Yes	We compare standard CRC with non-exchangeable (non-X) CRC, for which we use weights wi = 0.99n+1 i and predict ˆλ following Eq. (10). In both cases, we minimize the false negative rate (FNR)... and For non-X CRC, we use weights wi = 0.99n+1 i and we also experiment with weighted least-squares regression, placing weights ti = wi on each data point (non-X CRC + WLS). and We experiment using λ [0, 1] with a step of 0.01. and For non-X CRC, we choose weights {wi}n i=1 by computing the dot product between the embedding representations of {Xi}n i=1 and Xn+1, obtained using a sentence-transformer model... We use α = 0.3 and report results over 1000 trials.