Learning to Reject with a Fixed Predictor: Application to Decontextualization
Authors: Christopher Mohri, Daniel Andor, Eunsol Choi, Michael Collins, Anqi Mao, Yutao Zhong
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For evaluation, we choose the decontextualization task, and provide a manually-labelled dataset of 2,000 examples. Our algorithm significantly outperforms the baselines considered, with a 25% improvement in coverage when halving the error rate, which is only 3% away from the theoretical limit. |
| Researcher Affiliation | Collaboration | Christopher Mohri1, Daniel Andor2, Eunsol Choi3, Michael Collins2, Anqi Mao4, Yutao Zhong4 1Stanford University, 2Google, 3The University of Texas at Austin, 4Courant Institute |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. Methods are described in prose. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for their methodology or a link to a code repository. |
| Open Datasets | Yes | For our experiments, we labeled 2,000 decontextualizations of a fixed MT5 XXL model (Xue et al., 2020) ourselves... We randomly split our 2,000 annotation examples into 1,500 train/500 validation examples and perform 4-fold cross-validation... We provide additional empirical evaluation on two simpler image classification datasets: Fashion-MNIST (Xiao et al., 2017) and KMNIST (Clanuwat et al., 2018). |
| Dataset Splits | Yes | We randomly split our 2,000 annotation examples into 1,500 train/500 validation examples and perform 4-fold cross-validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU specifications, or memory amounts. |
| Software Dependencies | Yes | We further fine-tune a T5X 1.1 XXL decontextualization model (Roberts et al., 2022)... |
| Experiment Setup | Yes | We perform a hyper-parameter search over {1e 4,1e 3,1e 2} for the learning rate, and {0,0.05,...,0.2} for the dropout rate. |