reproducibilityindex.ai

Hidden Markov Transformer for Simultaneous Machine Translation

Authors: Shaolei Zhang, Yang Feng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on multiple Si MT benchmarks show that HMT outperforms strong baselines and achieves state-of-the-art performance.
Researcher Affiliation	Academia	1Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2University of Chinese Academy of Sciences, Beijing, China
Pseudocode	Yes	Algorithm 1 Inference Policy of Hidden Markov Transformer
Open Source Code	Yes	Code is available at https://github.com/ictnlp/HMT
Open Datasets	Yes	IWSLT15 English Vietnamese (En Vi) ... We use TED tst2012 (1553 pairs) as the validation set and TED tst2013 (1268 pairs) as the test set. Following the previous setting (Ma et al., 2020; Zhang & Feng, 2021c)... WMT15 German English (De En) (4.5M pairs) We use newstest2013 (3000 pairs) as the validation set and newstest2015 (2169 pairs) as the test set. BPE (Sennrich et al., 2016) is applied with 32K merge operations and the vocabulary of German and English is shared. Footnote 4: nlp.stanford.edu/projects/nmt/. Footnote 5: statmt.org/wmt15/translation-task.html
Dataset Splits	Yes	IWSLT15 English Vietnamese (En Vi) ... We use TED tst2012 (1553 pairs) as the validation set and TED tst2013 (1268 pairs) as the test set. ... WMT15 German English (De En) ... We use newstest2013 (3000 pairs) as the validation set and newstest2015 (2169 pairs) as the test set.
Hardware Specification	Yes	All speeds are evaluated on NVIDIA 3090 GPU.
Software Dependencies	No	The paper states, 'All systems are based on Transformer (Vaswani et al., 2017) from Fairseq Library (Ott et al., 2019).' However, it does not provide specific version numbers for Fairseq or any other software libraries.
Experiment Setup	Yes	Appendix D, Table 7 provides detailed hyperparameter settings for HMT, including encoder/decoder layers, attention heads, embed/ffn dimensions, dropout, optimizer (adam with beta values), learning rate, scheduler, warmup updates, weight decay, label smoothing, and max tokens for different Transformer sizes (Small, Base, Big) and datasets.