An EM Approach to Non-autoregressive Conditional Sequence Generation

Authors: Zhiqing Sun, Yiming Yang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on the task of machine translation. Experimental results on benchmark data sets show that the proposed approach achieves competitive, if not better, performance with existing NAR models and significantly reduces the inference latency.
Researcher Affiliation Academia 1Carnegie Mellon University, Pittsburgh, PA 15213 USA. Correspondence to: Zhiqing Sun <zhiqings@cs.cmu.edu>.
Pseudocode Yes Algorithm 1 An EM approach to NAR models
Open Source Code No The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use several benchmark tasks to evaluate the effectiveness of the proposed method, including IWSLT143 German-to-English translation (IWSLT14 De-En) and WMT144 English-to-German/German-to-English translation (WMT14 En-De/De-En). ...3https://wit3.fbk.eu/ 4http://statmt.org/wmt14/translation-task. html
Dataset Splits Yes For the WMT14 dataset, we use Newstest2014 as test data and Newstest2013 as validation data.
Hardware Specification Yes We evaluate the average per-sentence decoding latency on WMT14 En-De test sets with batch size 1 on a single NVIDIA Ge Force RTX 2080 Ti GPU by averaging 5 runs.
Software Dependencies No The paper mentions software components like "Adam optimizer" and "label smoothing" but does not specify their version numbers or the versions of any underlying programming frameworks or libraries (e.g., PyTorch, TensorFlow).
Experiment Setup Yes We use Adam optimizer (Kingma & Ba, 2014) and employ a label smoothing (Szegedy et al., 2016) of 0.1 in all experiments. The base and large models are trained for 125k steps on 8 TPUv3 nodes in each iteration, while the small models are trained for 20k steps. We use a beam size of 20/5 for the AR model in the M/E-step of our EM training algorithm. The pseudo bounds {ˆbi} is set by early stopping with the accuracy on the validation set.