reproducibilityindex.ai

Dual Supervised Learning

Authors: Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, Tie-Yan Liu

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effectiveness of DSL, we apply it to three artiﬁcial intelligence applications: (1) Neural Machine Translation... Experimental studies illustrate signiﬁcant accuracy improvements... (2) Image Processing... Experimental results show that on CIFAR-10, DSL could reduce the error rate... (3) Sentiment Analysis... Experiments on the IMDB dataset show that DSL can improve the error rate...
Researcher Affiliation	Collaboration	1School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, China 2Microsoft Research, Beijing, China.
Pseudocode	Yes	Algorithm 1 Dual Supervise Learning Algorithm
Open Source Code	No	The paper provides a link to an open-source implementation of a baseline NMT system (https://github.com/nyu-dl/dl4mt-tutorial) that they built upon, but it does not explicitly state that the specific Dual Supervised Learning (DSL) methodology or its implementation code is open-source or provide a direct link to it.
Open Datasets	Yes	Datasets We employ the same datasets as used in (Jean et al., 2015) to conduct experiments on En Fr and En De. As a part of WMT 14, the training data consists of 12M sentences pairs for En Fr and 4.5M for En De, respectively (WMT, 2014).
Dataset Splits	Yes	We combine newstest2012 and newstest2013 together as the validation sets and use newstest2014 as the test sets.
Hardware Specification	Yes	All of our experiments are done on a single Telsa K40m GPU.
Software Dependencies	No	The paper mentions various software components and models (e.g., GRU, Res Net, Pixel CNN++, Adadelta, Adam, SGD, LSTM) but does not specify their version numbers.
Experiment Setup	Yes	The word embedding dimension is 620 and the number of hidden node is 1000. [...] The gradient clip is set as 1.0, 5.0 and 1.0 during the training for En Fr, En De, and En Zh, respectively. [...] The value of both λxy and λyx in Algorithm 1 are set as 0.01 according to empirical performance on the validation set.