reproducibilityindex.ai

Blockwise Parallel Decoding for Deep Autoregressive Models

Authors: Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify our approach empirically through a series of experiments using state-of-the-art self-attention models for machine translation and image super-resolution, achieving iteration reductions of up to 2x over a baseline greedy decoder with no loss in quality, or up to 7x in exchange for a slight decrease in performance.
Researcher Affiliation	Collaboration	Mitchell Stern University of California, Berkeley mitchell@berkeley.edu Noam Shazeer Google Brain noam@google.com Jakob Uszkoreit Google Brain usz@google.com
Pseudocode	Yes	We propose the following blockwise parallel decoding algorithm (illustrated in Figure 1), which is guaranteed to produce the same prediction ˆy that would be found under greedy decoding but uses as few as m/k steps. As before, we start with an empty prediction ˆy and set j = 0. Then we repeat the following three substeps until the termination condition is met: Predict: Get the block predictions ˆyj+i = argmaxyj+i pi(yj+i \| ˆy j, x) for i = 1, . . . , k. Verify: Find the largest ˆk such that ˆyj+i = argmaxyj+i p1(yj+i \| ˆy j+i 1, x) for all 1 i ˆk. Note that ˆk 1 by the deﬁnition of ˆyj+1. Accept: Extend ˆy with ˆyj+1, . . . , ˆyj+ˆk and set j j + ˆk.
Open Source Code	Yes	Our code is publicly available in the open-source Tensor2Tensor library (Vaswani et al., 2018).
Open Datasets	Yes	For our machine translation experiments, we use the WMT 2014 English-German translation dataset.
Dataset Splits	Yes	We measure the BLEU score and the mean accepted block size ˆk on the development set under a variety of settings. Results are reported in Table 1.
Hardware Specification	Yes	Our baseline model is a Transformer trained for 1,000,000 steps on 8 P100 GPUs using the transformer_base hyperparameter set in Tensor2Tensor.
Software Dependencies	No	The paper states it uses the open-source Tensor2Tensor framework but does not specify its version number or other software dependencies with their versions.
Experiment Setup	Yes	Our baseline model is a Transformer trained for 1,000,000 steps on 8 P100 GPUs using the transformer_base hyperparameter set in Tensor2Tensor.