Neuron Interaction Based Representation Composition for Neural Machine Translation

Authors: Jian Li, Xing Wang, Baosong Yang, Shuming Shi, Michael R. Lyu, Zhaopeng Tu8204-8211

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on WMT14 English German and English French translation tasks show that our model consistently improves performances over the SOTA TRANSFORMER baseline. Further analyses demonstrate that our approach indeed captures more syntactic and semantic information as expected.
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, The Chinese University of Hong Kong 2Shenzhen Research Institute, The Chinese University of Hong Kong {jianli, lyu}@cse.cuhk.edu.hk 3Tencent AI Lab 4University of Macau {brightxwang, shumingshi,zptu}@tencent.com nlp2ct.baosong@gmail.com
Pseudocode No The paper describes mathematical formulations and processes but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for their proposed method or a link to a code repository.
Open Datasets Yes We conduct experiments on the WMT2014 English German (En De) and English French (En Fr) translation tasks. The En De dataset consists of about 4.56 million sentence pairs. We use newstest2013 as the development set and newstest2014 as the test set. The En Fr dataset consists of 35.52 million sentence pairs. We use the concatenation of newstest2012 and newstest2013 as the development set and newstest2014 as the test set.
Dataset Splits Yes We use newstest2013 as the development set and newstest2014 as the test set. The En Fr dataset consists of 35.52 million sentence pairs. We use the concatenation of newstest2012 and newstest2013 as the development set and newstest2014 as the test set.
Hardware Specification Yes All models are trained on eight NVIDIA P40 GPUs where each is allocated with a batch size of 4096 tokens.
Software Dependencies No The paper mentions using THUMT but does not provide specific version numbers for THUMT or other software dependencies.
Experiment Setup Yes The parameters of the proposed models are initialized by the pre-trained TRANSFORMER model. We have tested both Base and Big models, which differ at hidden size (512 vs. 1024) and number of attention heads (8 vs. 16). Concerning the low-rank parameter (Equation 9), we set low-rank dimensionality r to 512 and 1024 in Base and Big models respectively. All models are trained on eight NVIDIA P40 GPUs where each is allocated with a batch size of 4096 tokens.