reproducibilityindex.ai

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Authors: Raj Dabre, Atsushi Fujita6292-6299

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its signiﬁcantly fewer parameters, approaches that of a model that stacks 6 different layers.
Researcher Affiliation	Academia	Raj Dabre, Atsushi Fujita National Institute of Information and Communications Technology 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan ﬁrstname.lastname@nict.go.jp
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	See several sentence-level selfand cross-attention visualizations in our supplementary material.15 (Footnote 15: https://github.com/prajdabre/RSNMT)
Open Datasets	Yes	For our Japanese English (Ja-En) translation for both directions, we used the Asian Language Treebank (ALT) parallel corpus3 (Thu et al. 2016), the Global Communication Plan (GCP) corpus4 (Imamura and Sumita 2018), the Kyoto free translation task (KFTT) corpus,5 and the Asian Scientiﬁc Paper Excerpt Corpus (ASPEC)6 (Nakazawa et al. 2016). We also experimented with Turkish English (Tr-En) language pair using the WMT 2018 corpus.7
Dataset Splits	Yes	Table 1: Datasets and model settings. (Includes 'Dev' column with sentence counts). We used newstest2016 for development, and newstest2017 (test17) and newstest2018 (test18) for testing.
Hardware Specification	No	No specific GPU or CPU models were mentioned. The paper only states 'transformer base single gpu' for default settings and '4 GPUs for training' without specifying the model of the GPU.
Software Dependencies	Yes	We implemented our method on top of an open-source implementation of the Transformer model (Vaswani et al. 2017) in the version 1.6 branch of tensor2tensor.
Experiment Setup	Yes	For training, we used the default model settings corresponding to transformer base single gpu (Vaswani et al. 2017), except the number of sub-words, training iterations, and number of GPUs. ... The details of sub-word vocabularies and training iterations are in Table 1. ... We decoded the test set sentences with a beam size of 4 and length penalty of α = 0.6 for the KFTT Japanese-to-English experiments and α = 1.0 for the rest.