reproducibilityindex.ai

Sequence Prediction with Unlabeled Data by Reward Function Learning

Authors: Lijun Wu, Li Zhao, Tao Qin, Jianhuang Lai, Tie-Yan Liu

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that the pseudo reward can provide good supervision and guide the learning process on unlabeled data. We observe signiﬁcant improvements on both neural machine translation and text summarization.
Researcher Affiliation	Collaboration	Lijun Wu1, Li Zhao2, Tao Qin2, Jianhuang Lai1,3 and Tie-Yan Liu2 1School of Data and Computer Science, Sun Yat-sen University 2Microsoft Research Asia 3Guangdong Key Laboratory of Information Security Technology
Pseudocode	Yes	Algorithm 1: REINFORCE Training for Sequence Prediction with Unlabeled Data; Algorithm 2: Complete Algorithm for Sequence Prediction with Unlabeled Data
Open Source Code	No	The paper does not provide a link or explicit statement about the availability of their own source code for the methodology described. It only refers to the open-source code of a baseline model.
Open Datasets	Yes	For the machine translation experiment, we use data from the German-English machine translation track of the IWSLT 2014 evaluation campaign [Cettolo et al., 2014]... The data set we use to train and evaluate our model on text summarization is from a subset of Gigaword Corpus [Graff and Cieri, 2003].
Dataset Splits	Yes	The data consists of training/dev/test corpus with 153326, 6969 and 6750 sentence pairs respectively. ... The number of sentence pairs in the training set, validation set and test set are 189295, 18475 and 10000 respectively
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud instance types) used for running the experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions 'Blocks [Van Merri enboer et al., 2015]' and 'Torch [Collobert et al., 2011]' but does not provide specific version numbers for these or any other software components used.
Experiment Setup	Yes	Our policy is a GRU with 256 hidden units... The reward network is similar to the policy function except the concatenation of decoder hidden state and context vector are fed into another multilayer perceptron (MLP) with 256 hidden units... Hyper parameter α is 0.5, and the delay constant γ is 0.1.