Blockwise Parallel Decoding for Deep Autoregressive Models
Authors: Mitchell Stern, Noam Shazeer, Jakob Uszkoreit
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our approach empirically through a series of experiments using state-of-the-art self-attention models for machine translation and image super-resolution, achieving iteration reductions of up to 2x over a baseline greedy decoder with no loss in quality, or up to 7x in exchange for a slight decrease in performance. |
| Researcher Affiliation | Collaboration | Mitchell Stern University of California, Berkeley mitchell@berkeley.edu Noam Shazeer Google Brain noam@google.com Jakob Uszkoreit Google Brain usz@google.com |
| Pseudocode | Yes | We propose the following blockwise parallel decoding algorithm (illustrated in Figure 1), which is guaranteed to produce the same prediction ˆy that would be found under greedy decoding but uses as few as m/k steps. As before, we start with an empty prediction ˆy and set j = 0. Then we repeat the following three substeps until the termination condition is met: Predict: Get the block predictions ˆyj+i = argmaxyj+i pi(yj+i | ˆy j, x) for i = 1, . . . , k. Verify: Find the largest ˆk such that ˆyj+i = argmaxyj+i p1(yj+i | ˆy j+i 1, x) for all 1 i ˆk. Note that ˆk 1 by the definition of ˆyj+1. Accept: Extend ˆy with ˆyj+1, . . . , ˆyj+ˆk and set j j + ˆk. |
| Open Source Code | Yes | Our code is publicly available in the open-source Tensor2Tensor library (Vaswani et al., 2018). |
| Open Datasets | Yes | For our machine translation experiments, we use the WMT 2014 English-German translation dataset. |
| Dataset Splits | Yes | We measure the BLEU score and the mean accepted block size ˆk on the development set under a variety of settings. Results are reported in Table 1. |
| Hardware Specification | Yes | Our baseline model is a Transformer trained for 1,000,000 steps on 8 P100 GPUs using the transformer_base hyperparameter set in Tensor2Tensor. |
| Software Dependencies | No | The paper states it uses the open-source Tensor2Tensor framework but does not specify its version number or other software dependencies with their versions. |
| Experiment Setup | Yes | Our baseline model is a Transformer trained for 1,000,000 steps on 8 P100 GPUs using the transformer_base hyperparameter set in Tensor2Tensor. |