Improving Sequence-to-Sequence Learning via Optimal Transport
Authors: Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine translation, abstractive text summarization, and image captioning. |
| Researcher Affiliation | Collaboration | 1Duke University, 2Microsoft Research, 3Microsoft Dynamics 365 AI Research 4Baidu Research, 5SUNY at Buffalo |
| Pseudocode | Yes | Algorithm 1 IPOT algorithm Algorithm 2 Seq2Seq Learning via Optimal Transport. |
| Open Source Code | Yes | Code for our experiments are available from https: //github.com/LiqunChen0606/Seq2Seq-OT. |
| Open Datasets | Yes | We test our model on two datasets: (i) a small-scale English-Vietnamese parallel corpus of TEDtalks, which has 133K sentence pairs from the IWSLT Evaluation Campaign (Cettolo et al., 2015); and (ii) a large-scale English-German parallel corpus with 4.5M sentence pairs, from the WMT Evaluation Campaign (Vaswani et al., 2017). ... The first one is the Gigaword corpus (Graff et al., 2003)... We also evaluate our model on the DUC-2004 test set (Over et al., 2007)... We also consider an image captioning task using the COCO dataset (Lin et al., 2014) |
| Dataset Splits | Yes | The first one is the Gigaword corpus (Graff et al., 2003), which has around 3.8M training samples, 190K validation samples, and 1951 test samples. ... Following Karpathy s split (Karpathy & Fei-Fei, 2015), 113,287 images are used for training and 5,000 images are used for validation and testing. |
| Hardware Specification | Yes | All experiments are implemented with Tensorflow and run on a single NVIDIA TITAN X GPU. |
| Software Dependencies | No | The paper mentions "TensorFlow" as the implementation framework but does not provide a specific version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | We use SGD with learning rate 1.0 as follows: train for 12K steps (around 12 epochs); after 8K steps, we start halving learning rate every 1K step. ... We train for 350K steps (around 10 epochs); after 170K steps, we start halving learning rate every 17K step. ... we set β = 0.5 for the IPOT algorithm. |