Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning
Authors: Liqun Chen, Ke Bai, Chenyang Tao, Yizhe Zhang, Guoyin Wang, Wenlin Wang, Ricardo Henao, Lawrence Carin7512-7520
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effectiveness of the proposed solution, we perform a comprehensive evaluation covering a wide variety of NLP tasks: machine translation, abstractive text summarization and image caption, with consistent improvements over competing solutions. Experiments Datasets and setup We consider three tasks to evaluate our model: i) Machine translation:... |
| Researcher Affiliation | Academia | Liqun Chen, Ke Bai, Chenyang Tao, Yizhe Zhang, Guoyin Wang, Wenlin Wang, Ricardo Henao, Lawrence Carin Duke University |
| Pseudocode | Yes | Algorithm 1 IPOT algorithm. Algorithm 2 OTRL for Seq2Seq learning. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the specific methodology (OTRL) described by the authors. It mentions using existing codebases like Texar and a public PyTorch implementation for baselines. |
| Open Datasets | Yes | i) Machine translation: the commonly used English-German translation dataset and English-Vietnamese translation dataset, IWSLT 2014 (Cettolo et al., 2014)... ii) Abstractive summarization: two different datasets are employed for the summarization task, English gigawords (Graff et al., 2003) and CNN/Dailymail Hermann et al. (2015); Nallapati et al. (2016)... iii) Image captioning: we also consider an image captioning task with the COCO dataset (Lin et al., 2014)... |
| Dataset Splits | Yes | The ENDE dataset has 146K/7K/7K paired sentences for training/validation/testing, respectively. |
| Hardware Specification | Yes | All experiments are performed on one NVIDIA TITAN X GPU. |
| Software Dependencies | No | The paper mentions using Texar, PyTorch, and TensorFlow, but does not provide specific version numbers for these software dependencies or any other key libraries used in the experiments. |
| Experiment Setup | Yes | We use Adam optimization with learning rate 0.001 and batch size = 64 for training. For hyper-parameters λ1, λ2, λ3, we use parameter gridsearch to find the best setup for different tasks. In machine translation and abstractive summarization, We set λ1 = 0.9, λ2 = 0.1, λ3 = 0 as our initialization. Then we gradually decrease λ1 to 0.7, and increase λ3 to 0.2. This annealing process starts after the 7-th epoch. |