Dual Supervised Learning

Authors: Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, Tie-Yan Liu

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of DSL, we apply it to three artificial intelligence applications: (1) Neural Machine Translation... Experimental studies illustrate significant accuracy improvements... (2) Image Processing... Experimental results show that on CIFAR-10, DSL could reduce the error rate... (3) Sentiment Analysis... Experiments on the IMDB dataset show that DSL can improve the error rate...
Researcher Affiliation Collaboration 1School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, China 2Microsoft Research, Beijing, China.
Pseudocode Yes Algorithm 1 Dual Supervise Learning Algorithm
Open Source Code No The paper provides a link to an open-source implementation of a baseline NMT system (https://github.com/nyu-dl/dl4mt-tutorial) that they built upon, but it does not explicitly state that the specific Dual Supervised Learning (DSL) methodology or its implementation code is open-source or provide a direct link to it.
Open Datasets Yes Datasets We employ the same datasets as used in (Jean et al., 2015) to conduct experiments on En Fr and En De. As a part of WMT 14, the training data consists of 12M sentences pairs for En Fr and 4.5M for En De, respectively (WMT, 2014).
Dataset Splits Yes We combine newstest2012 and newstest2013 together as the validation sets and use newstest2014 as the test sets.
Hardware Specification Yes All of our experiments are done on a single Telsa K40m GPU.
Software Dependencies No The paper mentions various software components and models (e.g., GRU, Res Net, Pixel CNN++, Adadelta, Adam, SGD, LSTM) but does not specify their version numbers.
Experiment Setup Yes The word embedding dimension is 620 and the number of hidden node is 1000. [...] The gradient clip is set as 1.0, 5.0 and 1.0 during the training for En Fr, En De, and En Zh, respectively. [...] The value of both λxy and λyx in Algorithm 1 are set as 0.01 according to empirical performance on the validation set.