GRET: Global Representation Enhanced Transformer
Authors: Rongxiang Weng, Haoran Wei, Shujian Huang, Heng Yu, Lidong Bing, Weihua Luo, Jiajun Chen9258-9265
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation1. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Machine Intelligence Technology Lab, Alibaba Group, Hangzhou, China {wengrx, funan.whr, yuheng.yh, l.bing, weihua.luowh}@alibaba-inc.com, {huangsj, chenjj}@nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Dynamic Routing Algorithm |
| Open Source Code | Yes | 1Source code is available at: https://github.com/wengrx/GRET |
| Open Datasets | Yes | Data-sets We conduct experiments on machine translation and text summarization tasks. In machine translation, we employ our approach on four language pairs: Chinese to English (ZH EN), English to German (EN DE), German to English (DE EN), and Romanian to English (RO EN) 4. In text summarization, we use LCSTS (Hu, Chen, and Zhu 2015) 5 to evaluate the proposed method. These data-sets are public and widely used in previous work, which will make other researchers replicate our work easily. |
| Dataset Splits | Yes | On the ZH EN task, we use WMT17 as training set which consists of about 7.5M sentence pairs. We use newsdev2017 as validation set and newstest2017 as test set which have 2002 and 2001 sentence pairs, respectively. On the EN DE and DE EN tasks, we use WMT14 as training set which consists of about 4.5M sentence pairs. We use newstest2013 as validation set and newstest2014 as test set which have 2169 and 3000 sentence pairs, respectively. On the RO EN task, we use WMT16 as training set which consists of about 0.6M sentence pairs. We use newstest2015 as validation set and newstest2016 as test set which has 3000 and 3002 sentence pairs, respectively. In text summarization, following in Hu, Chen, and Zhu (2015) , we use PART I as training set which consists of 2M sentence pairs. We use the subsets of PART II and PART III scored from 3 to 5 as validation and test sets which consists of 8685 and 725 sentence pairs, respectively. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or detailed computer specifications) used for running the experiments were provided. The paper only mentions training time. |
| Software Dependencies | No | The paper mentions using Adam optimizer and BPE, but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or other key ancillary software components. |
| Experiment Setup | Yes | For the Transformer, we set the dimension of the input and output of all layers as 512, and that of the feed-forward layer to 2048. We employ 8 parallel attention heads. The number of layers for the encoder and decoder are 6. Sentence pairs are batched together by approximate sentence length. Each batch has 50 sentence and the maximum length of a sentence is limited to 100. We set the value of dropout rate to 0.1. We use the Adam (Kingma and Ba 2014) to update the parameters, and the learning rate was varied under a warm-up strategy with 4000 steps (Vaswani et al. 2017). Other details are shown in Vaswani et al. (2017). The number of capsules is set 32 and the default time of iteration is set 3. After the training stage, we use beam search for heuristic decoding, and the beam size is set to 4. |