Controlling Neural Machine Translation Formality with Synthetic Supervision
Authors: Xing Niu, Marine Carpuat8568-8575
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design experiments to evaluate the impact of our approaches to (1) formality control, and (2) synthetic supervision. We conduct a comprehensive automatic and human evaluation of the resulting FSMT systems. |
| Researcher Affiliation | Collaboration | Xing Niu,1 Marine Carpuat2 1Amazon AWS AI, 2University of Maryland |
| Pseudocode | No | The paper describes its algorithms and processes in textual paragraphs and through diagrams, but it does not include formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Source code: https://github.com/xingniu/multitask-ft-fsmt. |
| Open Datasets | Yes | We use the GYAFC corpus introduced by Rao and Tetreault (2018) in all tasks. We train MT systems on the concatenation of large diverse parallel corpora: (1) Europarl.v7 (Koehn 2005); (2) News-Commentary.v14 (Bojar et al. 2018); (3) Open Subtitles2016 (Lison and Tiedemann 2016). |
| Dataset Splits | Yes | The train split consists of 105K informal-formal sentence pairs whereas the dev/test sets consist of roughly 10K/5K pairs for both formality transfer directions, i.e., I F and F I. The learning rate for baseline models is initialized to 0.001 and reduced by 30% after 4 checkpoints without improvement of perplexity on the development set. |
| Hardware Specification | No | The paper describes the model architecture and training process (e.g., 'bidirectional encoder with a single LSTM layer'), but it does not specify any hardware components such as GPU models, CPU types, or cloud computing instances used for the experiments. |
| Software Dependencies | No | The paper mentions software like 'Sockeye toolkit' and 'Adam optimizer,' but it does not provide specific version numbers for these or other software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | Our translation model uses a bidirectional encoder with a single LSTM layer of size 512, multilayer perceptron attention with a layer size of 512, and word representations of size 512. We apply layer normalization [...], add dropout to embeddings and RNNs [...] with probability 0.2, and tie the source and target embeddings [...]. We train using the Adam optimizer [...] with a batch size of 64 sentences and we checkpoint the model every 1000 updates. The learning rate for baseline models is initialized to 0.001 and reduced by 30% after 4 checkpoints without improvement of perplexity on the development set. Training stops after 10 checkpoints without improvement. [...] inheriting all settings except the learning rate which is re-initialized to 0.0001. The hyperparameter α in Equation 5 is set to 0.05. |