reproducibilityindex.ai

Fast Inference from Transformers via Speculative Decoding

Authors: Yaniv Leviathan, Matan Kalman, Yossi Matias

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate it on T5-XXL and show a 2X-3X acceleration compared to the standard T5X implementation, with identical outputs. and 4. Experiments
Researcher Affiliation	Industry	1Google Research, Mountain View, CA, USA. Correspondence to: Yaniv Leviathan <leviathan@google.com>.
Pseudocode	Yes	Algorithm 1 Speculative Decoding Step
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We test a standard encoder-decoder T5 version 1.1 model (Raffel et al., 2020) on two tasks from the T5 paper: (1) English to German translation ﬁne tuned on WMT En De, and (2) Text summarization ﬁne tuned on CCN/DM. and trained on lm1b (Chelba et al., 2013).
Dataset Splits	No	The paper mentions fine-tuning on WMT En De, CCN/DM, and lm1b datasets, but it does not provide explicit training, validation, and test dataset splits (e.g., percentages or sample counts), nor does it reference predefined splits with specific citations for reproducibility.
Hardware Specification	Yes	We measure walltime improvements with a batch size of 1 on a single TPU-v4 for both argmax sampling (temp=0) and standard sampling (temp=1).
Software Dependencies	No	The paper mentions using the 'T5X implementation' and references 'T5 version 1.1 model' and 'Bert tokenization'. However, it does not provide specific version numbers for ancillary software dependencies such as programming languages, libraries (e.g., TensorFlow, PyTorch), or CUDA.
Experiment Setup	Yes	We test a standard encoder-decoder T5 version 1.1 model (Raffel et al., 2020) on two tasks from the T5 paper: (1) English to German translation ﬁne tuned on WMT En De, and (2) Text summarization ﬁne tuned on CCN/DM. For both tasks, we use T5-XXL (11B) for Mp. For the approximation model Mq we test several existing conﬁgurations, namely T5-large (800M), T5-base (250M), and T5-small (77M) (Raffel et al., 2020). We use existing checkpoints for all models. We measure walltime improvements with a batch size of 1 on a single TPU-v4 for both argmax sampling (temp=0) and standard sampling (temp=1).