reproducibilityindex.ai

Efficient Conformal Prediction via Cascaded Inference with Expanded Admission

Authors: Adam Fisch, Tal Schuster, Tommi S. Jaakkola, Regina Barzilay

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the empirical effectiveness of our approach for multiple applications in natural language processing and computational chemistry for drug discovery. We empirically validate our approach on information retrieval for fact veriﬁcation, open-domain question answering, and in-silico screening for drug discovery. We empirically evaluate our method on three different tasks with standard, publicly available datasets.
Researcher Affiliation	Academia	Adam Fisch Tal Schuster Tommi Jaakkola Regina Barzilay Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology {fisch,tals,tommi,regina}@csail.mit.edu
Pseudocode	Yes	Algorithm 1 Cascaded inductive conformal prediction with distribution-free marginal coverage.
Open Source Code	Yes	Our code is available at https://github.com/ajfisch/conformal-cascades.
Open Datasets	Yes	We use the open-domain setting of the Natural Questions dataset (Kwiatkowski et al., 2019). We use the FEVER dataset (Thorne et al., 2018). Using the Ch EMBL database (Mayr et al., 2018).
Dataset Splits	Yes	For each task, we use a proper training, validation, and test set. We retain 6750/8757 questions from the validation set and 2895/3610 from the test set. We follow the dataset splits of the Eraser benchmark (De Young et al., 2020) that contain 97,957 claims for training, 6,122 claims for validation, and 6,111 claims for test. We split the Ch EMBL dataset into a 60-20-20 split of molecules, where 60% of molecules are separated into a train set, 20% into a validation set, and 20% into a test set.
Hardware Specification	No	The paper states: 'In this work we do not measure wall-clock times as these are hardware-speciﬁc, and depend heavily on optimized implementations.' It also mentions 'even on a single CPU' for the RF model, but no specific CPU or GPU models are provided.
Software Dependencies	No	The paper mentions software like 'Gensim library', 'ALBERT-Base', 'chemprop repository', and 'Scikit library' but does not provide specific version numbers for these dependencies, which are required for reproducibility.
Experiment Setup	Yes	We perform model selection speciﬁcally for CP on the validation set, and report ﬁnal numbers on the test set. The QA and IR cascades use the Simes correction for MHT, while the DR cascades uses the Bonferroni correction. For each token, the model outputs independent scores for being the start or end of the answer span. We also follow Karpukhin et al. (2020) by using the output of the [CLS] token to get a passage selection score from the reader model. We collect 10 negative pairs for each positive one by randomly selecting other sentences from the same article as the correct evidence. We limit the number of negative samples (incorrect answers) to the top 64 incorrect predictions of the EXT model. The ﬁnal prediction is based on an ensemble of 5 models, trained with different random seeds.