Dual Supervised Learning
Authors: Yingce Xia, Tao Qin, Wei Chen, Jiang Bian, Nenghai Yu, Tie-Yan Liu
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of DSL, we apply it to three artificial intelligence applications: (1) Neural Machine Translation... Experimental studies illustrate significant accuracy improvements... (2) Image Processing... Experimental results show that on CIFAR-10, DSL could reduce the error rate... (3) Sentiment Analysis... Experiments on the IMDB dataset show that DSL can improve the error rate... |
| Researcher Affiliation | Collaboration | 1School of Information Science and Technology, University of Science and Technology of China, Hefei, Anhui, China 2Microsoft Research, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Dual Supervise Learning Algorithm |
| Open Source Code | No | The paper provides a link to an open-source implementation of a baseline NMT system (https://github.com/nyu-dl/dl4mt-tutorial) that they built upon, but it does not explicitly state that the specific Dual Supervised Learning (DSL) methodology or its implementation code is open-source or provide a direct link to it. |
| Open Datasets | Yes | Datasets We employ the same datasets as used in (Jean et al., 2015) to conduct experiments on En Fr and En De. As a part of WMT 14, the training data consists of 12M sentences pairs for En Fr and 4.5M for En De, respectively (WMT, 2014). |
| Dataset Splits | Yes | We combine newstest2012 and newstest2013 together as the validation sets and use newstest2014 as the test sets. |
| Hardware Specification | Yes | All of our experiments are done on a single Telsa K40m GPU. |
| Software Dependencies | No | The paper mentions various software components and models (e.g., GRU, Res Net, Pixel CNN++, Adadelta, Adam, SGD, LSTM) but does not specify their version numbers. |
| Experiment Setup | Yes | The word embedding dimension is 620 and the number of hidden node is 1000. [...] The gradient clip is set as 1.0, 5.0 and 1.0 during the training for En Fr, En De, and En Zh, respectively. [...] The value of both λxy and λyx in Algorithm 1 are set as 0.01 according to empirical performance on the validation set. |