Hidden Markov Transformer for Simultaneous Machine Translation
Authors: Shaolei Zhang, Yang Feng
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple Si MT benchmarks show that HMT outperforms strong baselines and achieves state-of-the-art performance. |
| Researcher Affiliation | Academia | 1Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2University of Chinese Academy of Sciences, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Inference Policy of Hidden Markov Transformer |
| Open Source Code | Yes | Code is available at https://github.com/ictnlp/HMT |
| Open Datasets | Yes | IWSLT15 English Vietnamese (En Vi) ... We use TED tst2012 (1553 pairs) as the validation set and TED tst2013 (1268 pairs) as the test set. Following the previous setting (Ma et al., 2020; Zhang & Feng, 2021c)... WMT15 German English (De En) (4.5M pairs) We use newstest2013 (3000 pairs) as the validation set and newstest2015 (2169 pairs) as the test set. BPE (Sennrich et al., 2016) is applied with 32K merge operations and the vocabulary of German and English is shared. Footnote 4: nlp.stanford.edu/projects/nmt/. Footnote 5: statmt.org/wmt15/translation-task.html |
| Dataset Splits | Yes | IWSLT15 English Vietnamese (En Vi) ... We use TED tst2012 (1553 pairs) as the validation set and TED tst2013 (1268 pairs) as the test set. ... WMT15 German English (De En) ... We use newstest2013 (3000 pairs) as the validation set and newstest2015 (2169 pairs) as the test set. |
| Hardware Specification | Yes | All speeds are evaluated on NVIDIA 3090 GPU. |
| Software Dependencies | No | The paper states, 'All systems are based on Transformer (Vaswani et al., 2017) from Fairseq Library (Ott et al., 2019).' However, it does not provide specific version numbers for Fairseq or any other software libraries. |
| Experiment Setup | Yes | Appendix D, Table 7 provides detailed hyperparameter settings for HMT, including encoder/decoder layers, attention heads, embed/ffn dimensions, dropout, optimizer (adam with beta values), learning rate, scheduler, warmup updates, weight decay, label smoothing, and max tokens for different Transformer sizes (Small, Base, Big) and datasets. |