reproducibilityindex.ai

Improving Sequence-to-Sequence Learning via Optimal Transport

Authors: Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine translation, abstractive text summarization, and image captioning.
Researcher Affiliation	Collaboration	1Duke University, 2Microsoft Research, 3Microsoft Dynamics 365 AI Research 4Baidu Research, 5SUNY at Buffalo
Pseudocode	Yes	Algorithm 1 IPOT algorithm Algorithm 2 Seq2Seq Learning via Optimal Transport.
Open Source Code	Yes	Code for our experiments are available from https: //github.com/LiqunChen0606/Seq2Seq-OT.
Open Datasets	Yes	We test our model on two datasets: (i) a small-scale English-Vietnamese parallel corpus of TEDtalks, which has 133K sentence pairs from the IWSLT Evaluation Campaign (Cettolo et al., 2015); and (ii) a large-scale English-German parallel corpus with 4.5M sentence pairs, from the WMT Evaluation Campaign (Vaswani et al., 2017). ... The first one is the Gigaword corpus (Graff et al., 2003)... We also evaluate our model on the DUC-2004 test set (Over et al., 2007)... We also consider an image captioning task using the COCO dataset (Lin et al., 2014)
Dataset Splits	Yes	The first one is the Gigaword corpus (Graff et al., 2003), which has around 3.8M training samples, 190K validation samples, and 1951 test samples. ... Following Karpathy s split (Karpathy & Fei-Fei, 2015), 113,287 images are used for training and 5,000 images are used for validation and testing.
Hardware Specification	Yes	All experiments are implemented with Tensorﬂow and run on a single NVIDIA TITAN X GPU.
Software Dependencies	No	The paper mentions "TensorFlow" as the implementation framework but does not provide a specific version number or other software dependencies with version numbers.
Experiment Setup	Yes	We use SGD with learning rate 1.0 as follows: train for 12K steps (around 12 epochs); after 8K steps, we start halving learning rate every 1K step. ... We train for 350K steps (around 10 epochs); after 170K steps, we start halving learning rate every 17K step. ... we set β = 0.5 for the IPOT algorithm.