reproducibilityindex.ai

Learning to Compose Words into Sentences with Reinforcement Learning

Authors: Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate the beneﬁt of learning task-speciﬁc composition orders, outperforming both sequential encoders and recursive encoders based on treebank annotations. Experiments on various tasks (i.e., sentiment analysis, semantic relatedness, natural language inference, and sentence generation) show that reinforcement learning is a promising direction to discover hierarchical structures of sentences.
Researcher Affiliation	Collaboration	Dani Yogatama1, Phil Blunsom1,2, Chris Dyer1, Edward Grefenstette1, and Wang Ling1 1Deep Mind and 2University of Oxford {dyogatama,pblunsom,cdyer,etg,lingwang}@google.com
Pseudocode	No	The paper describes the steps and processes involved in the model and reinforcement learning method using prose and mathematical equations, but it does not include any formally structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement that the authors are releasing their source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Table 1: Descriptive statistics of datasets used in our experiments. Dataset # of train # of dev # of test Vocab size SICK 4,500 500 4,927 2,172 SNLI 550,152 10,000 10,000 18,461 SST 98,794 872 1,821 8,201 IMDB 441,617 223,235 223,236 29,209
Dataset Splits	Yes	Table 1: Descriptive statistics of datasets used in our experiments. Dataset # of train # of dev # of test Vocab size SICK 4,500 500 4,927 2,172 SNLI 550,152 10,000 10,000 18,461 SST 98,794 872 1,821 8,201 IMDB 441,617 223,235 223,236 29,209
Hardware Specification	No	The paper discusses training time and computational limitations, stating, 'an epoch could take a 5-7 hours' and 'it takes 3-4 days for the model to reach convergence', but it does not specify any particular hardware components such as CPU/GPU models or memory.
Software Dependencies	No	The paper mentions tools like 'Stanford parser' and 'evalb toolkit' but does not provide specific version numbers for these or any other software dependencies, aside from general algorithm names like LSTM.
Experiment Setup	Yes	For learning, we use stochastic gradient descent with minibatches of size 1 and ℓ2 regularization constant tune on development data from {10 4, 10 5, 10 6, 0}. We set the word embedding size to 100 and initialize them with Glove vectors (Pennington et al., 2014). For each sentence, we create a 100-dimensional sentence representation s R100 with Tree LSTM, project it to a 200-dimensional vector and apply Re LU: q = Re LU(Wps + bp), and compute p(ˆy = c \| q; wq) exp(wq,cq + bq). We run each model 3 times (corresponding to 3 different initialization points) and use the development data to pick the best model.