Insertion Transformer: Flexible Sequence Generation via Insertion Operations
Authors: Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach by analyzing its performance on the WMT 2014 English German machine translation task under various settings for training and decoding. We find that the Insertion Transformer outperforms many prior non-autoregressive approaches to translation at comparable or better levels of parallelism, and successfully recovers the performance of the original Transformer while requiring only logarithmically many iterations during decoding. In this section, we explore the efficacy of our approach on a machine translation task, analyzing its performance under different training conditions, architectural choices, and decoding procedures. We experiment on the WMT 2014 English-German translation dataset, using newstest2013 for development and newstest2014 for testing, respectively. |
| Researcher Affiliation | Collaboration | 1Google Brain, Mountain View, Toronto, Berlin 2University of California, Berkeley. |
| Pseudocode | No | The paper describes the model architecture and processes verbally but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using TensorFlow and Tensor2Tensor framework but does not provide any concrete access to source code for the Insertion Transformer itself, nor does it state that the code is being released. |
| Open Datasets | Yes | We experiment on the WMT 2014 English-German translation dataset, using newstest2013 for development and newstest2014 for testing, respectively. |
| Dataset Splits | Yes | We experiment on the WMT 2014 English-German translation dataset, using newstest2013 for development and newstest2014 for testing, respectively. |
| Hardware Specification | Yes | All our models are trained for 1,000,000 steps on eight P100 GPUs. |
| Software Dependencies | No | All our experiments are implemented in Tensor Flow (Abadi et al., 2015) using the Tensor2Tensor framework (Vaswani et al., 2018). (The paper mentions software names but does not provide specific version numbers for them). |
| Experiment Setup | Yes | We use the default transformer base hyperparameter set reported by Vaswani et al. (2018) for all hyperparameters not specific to our model. We perform no additional hyperparameter tuning. All our models are trained for 1,000,000 steps on eight P100 GPUs. |