reproducibilityindex.ai

Confident Adaptive Language Modeling

Authors: Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Tran, Yi Tay, Donald Metzler

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through theoretical analysis and empirical experiments on three diverse text generation tasks, we demonstrate the efﬁcacy of our framework in reducing compute speedup of up to 3 while provably maintaining high performance.
Researcher Affiliation	Collaboration	Tal Schuster1, Adam Fisch2, Jai Gupta1 Mostafa Dehghani1 Dara Bahri1 Vinh Q. Tran1 Yi Tay1 Donald Metzler1 1Google Research 2CSAIL, MIT
Pseudocode	Yes	An Algorithm of the full procedure is provided in Appendix E.
Open Source Code	Yes	1Code: https://github.com/google-research/t5x/tree/main/t5x/contrib/calm
Open Datasets	Yes	We empirically evaluate our methods on three popular text generation tasks that vary in their target generation length and extractive degrees against the input. CNN/DM [31] is a collection of news articles to be summarized in few sentences. WMT15 EN-FR [13] contains English sentences (one per example) to be machine translated to French. Open-book SQUAD 1.1 [54] is a QA dataset with Wikipedia paragraphs paired with questions, where the target answer is a text span from the input.
Dataset Splits	Yes	For each task, we use the validation and test sets to evaluate our calibration method ( 4) (for SQUAD we only use the validation set as the test answers are hidden). We run 50 random trials per target tolerance δ and consistency objective (textual or risk), where we partition the data to 80% calibration (Scal) and 20% test (Ptest).
Hardware Specification	Yes	Also, we compute an estimated speedup of the whole encoder-decoder model for generating the full sequence, based on TPUv3 benchmarking with 200 examples in Colab (see App. C for details).
Software Dependencies	No	The paper mentions using the 'T5x framework [55]' and 'JAX: composable transformations of Python+NumPy programs' but does not specify version numbers for these software dependencies or other key libraries.
Experiment Setup	Yes	We use the 8 layers T5 1.1 model that doesn t share input and output embeddings. We share all output embeddings for the softmax predictions, and the early-exit classiﬁer across all decoder layers. Based on validation results, we set the temperature of our decaying threshold to = 4 for the softmax and classiﬁer measures of CNN/DM and WMT. In other settings, we use = 0.