Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast Structured Decoding for Sequence Models
Authors: Zhiqing Sun, Zhuohan Li, Haoqing Wang, Di He, Zi Lin, Zhihong Deng
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in machine translation show that while increasing little latency (8 14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models. |
| Researcher Affiliation | Academia | Zhiqing Sun1, Zhuohan Li2, Haoqing Wang3 Di He3 Zi Lin3 Zhi-Hong Deng3 1Carnegie Mellon University 2University of California, Berkeley 3Peking University |
| Pseudocode | No | The paper describes algorithms and models in text and diagrams (Figure 1), but does not contain a structured pseudocode or algorithm block. |
| Open Source Code | Yes | The reproducible code can be found at https://github.com/Edward-Sun/structured-nart |
| Open Datasets | Yes | We use several widely adopted benchmark tasks to evaluate the effectiveness of our proposed models: IWSLT143 German-to-English translation (IWSLT14 De-En) and WMT144 English-to German/German-to-English translation (WMT14 En-De/De-En). 3https://wit3.fbk.eu/ 4http://statmt.org/wmt14/translation-task.html |
| Dataset Splits | Yes | For the WMT14 dataset, we use Newstest2014 as test data and Newstest2013 as validation data. |
| Hardware Specification | Yes | Models for WMT14/IWSLT14 tasks are trained on 4/1 NVIDIA P40 GPUs, respectively. [...] we evaluate the average per-sentence decoding latency on WMT14 En-De test sets with batch size 1 with a single NVIDIA Tesla P100 GPU for the Transformer model and the NART models to measure the speedup of our models. |
| Software Dependencies | No | The paper states 'We implement our models based on the open-sourced tensor2tensor library [23]' and 'We use Adam [30] optimizer' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For the WMT14 dataset, we use the default network architecture of the original base Transformer [1], which consists of a 6-layer encoder and 6-layer decoder. The size of hidden states dmodel is set to 512. [...] For all datasets, we set the size of transition embedding dt to 32 and the beam size k of beam approximation to 64. Hyperparameter λ is set to 0.5 to balance the scale of two loss components. [...] We use Adam [30] optimizer and employ label smoothing of value ϵls = 0.1 [31] in all experiments. |