Self-supervised Bilingual Syntactic Alignment for Neural Machine Translation
Authors: Tianfu Zhang, Heyan Huang, Chong Feng, Longbing Cao14454-14462
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results on three typical NMT tasks: WMT 14 English German, IWSLT 14 German English, and NC 11 English French show the Synt Aligner effectiveness and universality of syntactic alignment. |
| Researcher Affiliation | Academia | Tianfu Zhang,1 Heyan Huang, 1 Chong Feng, 1 Longbing Cao, 2 1 Beijing Institute of Technology 2 University of Technology Sydney {tianfuzhang,hhy63,fengchong}@bit.edu.cn, Long Bing.Cao@uts.edu.au |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Figure 2 provides an architecture diagram but not algorithmic steps. |
| Open Source Code | No | The paper does not provide concrete access to its own source code. It mentions using third-party open-source toolkits like Open NMT, Berkeley Neural Parser, and BPE toolkit, but does not provide a repository link or explicit statement for the code implemented for this paper's methodology. |
| Open Datasets | Yes | We test Synt Aligner on three language translation tasks: WMT14 English German (En De), IWSLT14 German English (De En), and WMT14 News Commentary version 11 (NC11) English French (En Fr). For the En De translation, the training data consists of 4.5M sentence pairs... For the De En translation, the training set consists of 160K sentence pairs... For the En Fr translation, the training data consists of 180K sentence pairs... |
| Dataset Splits | Yes | For the En De translation, the training data consists of 4.5M sentence pairs (newstest2013 and newstest2014 as the validation set and sets)... For the De En translation, the training set consists of 160K sentence pairs and we randomly draw 7K samples from the training set as the validation set... For the En Fr translation, the training data consists of 180K sentence pairs (newstest2013 and newstest2014 as validation and test sets). |
| Hardware Specification | Yes | All models are trained on four NVIDIA TITAN Xp GPUs where each is allocated with a batch size of 4,096 tokens. |
| Software Dependencies | No | The paper mentions software like 'Open NMT (Klein et al. 2017)', 'byte-pair encoding (BPE) toolkit (Sennrich et al. 2016)', and 'Berkeley Neural Parser (Kitaev et al. 2018)'. However, it does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We follow the Transformer (base model) setting in (Vaswani et al. 2017) to train the models... The hidden size is 512, filter size is 2,048, and the number of attention heads is 8. All models are trained on four NVIDIA TITAN Xp GPUs where each is allocated with a batch size of 4,096 tokens. We adopt a fine-tuning training strategy... firstly pretrain about 30 epochs for all translation tasks... Then, we fine-tune the NMT models for about 1 3 epochs... |