An EM Approach to Non-autoregressive Conditional Sequence Generation
Authors: Zhiqing Sun, Yiming Yang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the task of machine translation. Experimental results on benchmark data sets show that the proposed approach achieves competitive, if not better, performance with existing NAR models and significantly reduces the inference latency. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University, Pittsburgh, PA 15213 USA. Correspondence to: Zhiqing Sun <zhiqings@cs.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 An EM approach to NAR models |
| Open Source Code | No | The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use several benchmark tasks to evaluate the effectiveness of the proposed method, including IWSLT143 German-to-English translation (IWSLT14 De-En) and WMT144 English-to-German/German-to-English translation (WMT14 En-De/De-En). ...3https://wit3.fbk.eu/ 4http://statmt.org/wmt14/translation-task. html |
| Dataset Splits | Yes | For the WMT14 dataset, we use Newstest2014 as test data and Newstest2013 as validation data. |
| Hardware Specification | Yes | We evaluate the average per-sentence decoding latency on WMT14 En-De test sets with batch size 1 on a single NVIDIA Ge Force RTX 2080 Ti GPU by averaging 5 runs. |
| Software Dependencies | No | The paper mentions software components like "Adam optimizer" and "label smoothing" but does not specify their version numbers or the versions of any underlying programming frameworks or libraries (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | We use Adam optimizer (Kingma & Ba, 2014) and employ a label smoothing (Szegedy et al., 2016) of 0.1 in all experiments. The base and large models are trained for 125k steps on 8 TPUv3 nodes in each iteration, while the small models are trained for 20k steps. We use a beam size of 20/5 for the AR model in the M/E-step of our EM training algorithm. The pseudo bounds {ˆbi} is set by early stopping with the accuracy on the validation set. |