Correct-and-Memorize: Learning to Translate from Interactive Revisions

Authors: Rongxiang Weng, Hao Zhou, Shujian Huang, Lei Li, Yifan Xia, Jiajun Chen

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods1.
Researcher Affiliation Collaboration Rongxiang Weng1,2 , Hao Zhou3 , Shujian Huang1,2 , Lei Li3 , Yifan Xia1,2 and Jiajun Chen1,2 1National Key Laboratory for Novel Software Technology, Nanjing, China 2Nanjing University, Nanjing, China 3Byte Dance AI Lab, Beijing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at: https://github.com/wengrx/CAMIT
Open Datasets Yes For both ZH-EN and EN-ZH, we use NIST data-set to evaulate the proposed framework. The training data consists of about 1.6 million sentence pairs.2 We use NIST03 as our validation set, and NIST04 and NIST05 as our test sets. These sets have 919, 1597 and 1082 source sentences, respectively, with 4 references. In EN-ZH, we use ref0 of each data set as source sentences. We extract about 0.2 million sentence pairs3 in our training set, which retains the discourse information for training parameters of the revision memory. Furthermore, we also use IWSLT2015 data-set [Cettolo et al., 2012] on the ZH-EN translation task. 2includes LDC2002E18, LDC2003E14, LDC2004T08, LDC2005T06
Dataset Splits Yes We use NIST03 as our validation set, and NIST04 and NIST05 as our test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments.
Software Dependencies No The paper mentions implementing the model upon 'NJUNMT-pytorch' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We train the bi-directional NMT model with the sentences of length up to 50 words. For the RNNSearch, vocabularies of both Chinese and English includes the most frequent 30K words for both Chinese and English. The dimension of word embedding is 512, and the size of hidden layers is 1024. We use the gradient descent approach to update the parameters, with a batch size of 80. The learning rate is controlled by Adam [Kingma and Ba, 2014]. For the Transformer, we apply byte pair encoding (BPE) [Sennrich et al., 2016] to all languages and limit the vocabulary size to 32K. We set the dimension of input and output of all layers as 512, and that of feed-forward layer to 2048. We employ 8 parallel attention heads. The number of layers for the encoder and decoder are 6. Other settings is same as Vaswani et al. [2017] . We use beam search for heuristic decoding, and the size is set as 4. The learning rate of online learning is 10 5. The size of revision memory is 100 and it will be used when revision number is more than 20.