Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Agreement on Target-Bidirectional Recurrent Neural Networks for Sequence-to-Sequence Learning
Authors: Lemao Liu, Andrew Finch, Masao Utiyama, Eiichiro Sumita
JAIR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments were performed on three standard sequence-to-sequence transduction tasks: machine transliteration, grapheme-to-phoneme transformation and machine translation. The results show that the proposed approach achieves consistent and substantial improvements, compared to many state-of-the-art systems. |
| Researcher Affiliation | Academia | Lemao Liu EMAIL Andrew Finch EMAIL Masao Utiyama EMAIL Eiichiro Sumita EMAIL National Institute of Information & Communications Technology 3-5 Hikari-dai, Seika-cho, Soraku-gun, Kyoto, Japan |
| Pseudocode | Yes | Algorithm 1 Beam Search Algorithm Algorithm 2 Variant Beam Search Algorithm |
| Open Source Code | Yes | Our toolkit is publicly available on https://github.com/lemaoliu/Agtarbidir. |
| Open Datasets | Yes | For the machine transliteration task, we conducted both Japanese-to-English (Jp-En) and English-to-Japanese (En-Jp) directional subtasks. The transliteration training, development and test sets were taken from Wikipedia inter-language link titles7 from Fukunishi, Finch, Yamamoto, and Sumita (2013): For grapheme-to-phoneme (Gm-Pm) conversion, the standard CMUdict8 data sets were used: the original training set was randomly split into our training set (about 110000 sequence pairs) and development set (2000 pairs); the original test set consisting of about 12000 pairs was used for testing. For the Jp-En task, we use the data from NTCIR-9 (Goto, Lu, Chow, Sumita, & Tsou, 2011): the training data consisted of 2.0M sentence pairs, The development and test sets contained 2K sentences with a single referece, respectively. For the Ch-En task, we used the data from the NIST2008 Open Machine Translation Campaign: the training data consisted of 1.8M sentence pairs, the development set was nist02 (878 sentences), and the test sets were nist05 (1082 sentences), nist06 (1664 sentences) and nist08 (1357 sentences). |
| Dataset Splits | Yes | the training data consisted of 59000 sequence pairs composed of 313378 Japanese katakana characters and 445254 English characters; the development and test data were manually cleaned and each of them consisted of 1000 sequence pairs. For grapheme-to-phoneme (Gm-Pm) conversion, the standard CMUdict8 data sets were used: the original training set was randomly split into our training set (about 110000 sequence pairs) and development set (2000 pairs); the original test set consisting of about 12000 pairs was used for testing. For the Jp-En task, we use the data from NTCIR-9 (Goto, Lu, Chow, Sumita, & Tsou, 2011): the training data consisted of 2.0M sentence pairs, The development and test sets contained 2K sentences with a single referece, respectively. For the Ch-En task, we used the data from the NIST2008 Open Machine Translation Campaign: the training data consisted of 1.8M sentence pairs, the development set was nist02 (878 sentences), and the test sets were nist05 (1082 sentences), nist06 (1664 sentences) and nist08 (1357 sentences). |
| Hardware Specification | Yes | Training was conducted on a single Tesla K80 GPU, and it took about 6 days to train a single Ab RNN system on our large-scale data. |
| Software Dependencies | No | Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (Sci Py). We use the adadelta for training RNN based systems: the decay rate ρ and constant ϵ were set as 0.95 and 10−6 as suggested by Zeiler (2012). Moses: a phrase based machine translation model (Koehn et al., 2007) used with default settings. GIZA++ (Och & Ney, 2000) with grow-diag-final-and was used to build the translation model. We trained 5-gram target language models with srilm (Stolcke et al., 2002) using the training set for Jp-En and the Gigaword corpus for Ch-En. |
| Experiment Setup | Yes | For all of the re-implemented models based on Af RNN, the number of word embedding units and hidden units were set to 500. We use the adadelta for training RNN based systems: the decay rate ρ and constant ϵ were set as 0.95 and 10−6 as suggested by Zeiler (2012), and minibatch sizes were 16. In our experiments, we found one layer RNN works well for Af RNN, thanks to the limited vocabulary in this task. Therefore, we employ one layer RNN for all Af RNN based models including both unidirectional and bidirectional models. For all of RNN based models, we used the same configuration and hyperparameters as in machine transliteration task except that the minibatch size was 64 for Gm-Pm task. We used the following settings for Ab RNN-based systems: the dimension of word embedding was 620, the dimension of hidden units was 1000, the batch size was 80, the source and target side vocabulary sizes were 30000, the maximum sequence length was set to 80, and the beam size for decoding was 12. |