reproducibilityindex.ai

Learning to Reject with a Fixed Predictor: Application to Decontextualization

Authors: Christopher Mohri, Daniel Andor, Eunsol Choi, Michael Collins, Anqi Mao, Yutao Zhong

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For evaluation, we choose the decontextualization task, and provide a manually-labelled dataset of 2,000 examples. Our algorithm significantly outperforms the baselines considered, with a 25% improvement in coverage when halving the error rate, which is only 3% away from the theoretical limit.
Researcher Affiliation	Collaboration	Christopher Mohri1, Daniel Andor2, Eunsol Choi3, Michael Collins2, Anqi Mao4, Yutao Zhong4 1Stanford University, 2Google, 3The University of Texas at Austin, 4Courant Institute
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. Methods are described in prose.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for their methodology or a link to a code repository.
Open Datasets	Yes	For our experiments, we labeled 2,000 decontextualizations of a fixed MT5 XXL model (Xue et al., 2020) ourselves... We randomly split our 2,000 annotation examples into 1,500 train/500 validation examples and perform 4-fold cross-validation... We provide additional empirical evaluation on two simpler image classification datasets: Fashion-MNIST (Xiao et al., 2017) and KMNIST (Clanuwat et al., 2018).
Dataset Splits	Yes	We randomly split our 2,000 annotation examples into 1,500 train/500 validation examples and perform 4-fold cross-validation.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU specifications, or memory amounts.
Software Dependencies	Yes	We further fine-tune a T5X 1.1 XXL decontextualization model (Roberts et al., 2022)...
Experiment Setup	Yes	We perform a hyper-parameter search over {1e 4,1e 3,1e 2} for the learning rate, and {0,0.05,...,0.2} for the dropout rate.