A Latent Morphology Model for Open-Vocabulary Neural Machine Translation
Authors: Duygu Ataman, Wilker Aziz, Alexandra Birch
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings. We evaluate our method in translating English into three morphologically-rich languages each with a distinct morphological typology: Arabic, Czech and Turkish, and show that our model is able to obtain better translation accuracy and generalization capacity than conventional approaches to open-vocabulary NMT. |
| Researcher Affiliation | Academia | Duygu Ataman University of Z urich ataman@cl.uzh.ch, Wilker Aziz University of Amsterdam w.aziz@uva.nl, Alexandra Birch University of Edinburgh a.birch@ed.ac.uk |
| Pseudocode | Yes | Algorithm 1: Word generation: in training the word is observed, thus we only update the decoder and assess the probability of the observation, in test, we use mean values of the distributions to represent most likely values for z and f and populate predictions with beam-search. |
| Open Source Code | Yes | Our software is available at: https://github.com/d-ataman/lmm |
| Open Datasets | Yes | We use the TED Talks corpora (Cettolo, 2012) for training the NMT models for these experiments. In 4.4.3, we conduct more experiments in Turkish to demonstrate the case of increased data sparsity using multi-domain training corpora, where we extend the training set using corpora from EU Bookshop (Skadin ˇs et al., 2014), Global Voices, Gnome, Tatoeba, Ubuntu (Tiedemann, 2012), KDE4 (Tiedemann, 2009), Open Subtitles (Lison & Tiedemann, 2016) and SETIMES (Tyers & Alperen, 2010). |
| Dataset Splits | Yes | We use the official evaluation sets of the IWSLT10 for validating and testing the accuracy of the models. ... Table 6 presents the evaluation sets used for development and testing. English-Arabic Development dev2010, test2010 6K English-Czech Development dev2010, test2010, 3K test2011 English-Turkish Development dev2010, test2010 3K |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments were provided in the paper. It only mentions the use of GRU and Open NMT-py framework. |
| Software Dependencies | No | The paper mentions 'Pytorch (Paszke et al., 2017)' and 'Open NMT-py framework (Klein et al., 2017)' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | All models use an embedding dimension and GRU size of 512. LMM uses the same hierarchical GRU architecture, where the middle layer is augmented using 4 multi-layer perceptrons with 256 hidden units. We use a lemma vector dimension of 150, 10 inflectional features... and set the regularization constant to ρ = 0.4. All models are trained using the Adam optimizer (Kinga & Ba, 2014) with a batch size of 100, dropout rate of 0.2, learning rate of 0.0004 and learning rate decay of 0.8, applied when the perplexity does not decrease at a given epoch. Translations are generated with beam search with a beam size of 5... |