Controlling Neural Machine Translation Formality with Synthetic Supervision

Authors: Xing Niu, Marine Carpuat8568-8575

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design experiments to evaluate the impact of our approaches to (1) formality control, and (2) synthetic supervision. We conduct a comprehensive automatic and human evaluation of the resulting FSMT systems.
Researcher Affiliation Collaboration Xing Niu,1 Marine Carpuat2 1Amazon AWS AI, 2University of Maryland
Pseudocode No The paper describes its algorithms and processes in textual paragraphs and through diagrams, but it does not include formal pseudocode blocks or algorithm listings.
Open Source Code Yes Source code: https://github.com/xingniu/multitask-ft-fsmt.
Open Datasets Yes We use the GYAFC corpus introduced by Rao and Tetreault (2018) in all tasks. We train MT systems on the concatenation of large diverse parallel corpora: (1) Europarl.v7 (Koehn 2005); (2) News-Commentary.v14 (Bojar et al. 2018); (3) Open Subtitles2016 (Lison and Tiedemann 2016).
Dataset Splits Yes The train split consists of 105K informal-formal sentence pairs whereas the dev/test sets consist of roughly 10K/5K pairs for both formality transfer directions, i.e., I F and F I. The learning rate for baseline models is initialized to 0.001 and reduced by 30% after 4 checkpoints without improvement of perplexity on the development set.
Hardware Specification No The paper describes the model architecture and training process (e.g., 'bidirectional encoder with a single LSTM layer'), but it does not specify any hardware components such as GPU models, CPU types, or cloud computing instances used for the experiments.
Software Dependencies No The paper mentions software like 'Sockeye toolkit' and 'Adam optimizer,' but it does not provide specific version numbers for these or other software dependencies required to reproduce the experiments.
Experiment Setup Yes Our translation model uses a bidirectional encoder with a single LSTM layer of size 512, multilayer perceptron attention with a layer size of 512, and word representations of size 512. We apply layer normalization [...], add dropout to embeddings and RNNs [...] with probability 0.2, and tie the source and target embeddings [...]. We train using the Adam optimizer [...] with a batch size of 64 sentences and we checkpoint the model every 1000 updates. The learning rate for baseline models is initialized to 0.001 and reduced by 30% after 4 checkpoints without improvement of perplexity on the development set. Training stops after 10 checkpoints without improvement. [...] inheriting all settings except the learning rate which is re-initialized to 0.0001. The hyperparameter α in Equation 5 is set to 0.05.