Learning to Compose Words into Sentences with Reinforcement Learning
Authors: Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the benefit of learning task-specific composition orders, outperforming both sequential encoders and recursive encoders based on treebank annotations. Experiments on various tasks (i.e., sentiment analysis, semantic relatedness, natural language inference, and sentence generation) show that reinforcement learning is a promising direction to discover hierarchical structures of sentences. |
| Researcher Affiliation | Collaboration | Dani Yogatama1, Phil Blunsom1,2, Chris Dyer1, Edward Grefenstette1, and Wang Ling1 1Deep Mind and 2University of Oxford {dyogatama,pblunsom,cdyer,etg,lingwang}@google.com |
| Pseudocode | No | The paper describes the steps and processes involved in the model and reinforcement learning method using prose and mathematical equations, but it does not include any formally structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement that the authors are releasing their source code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Table 1: Descriptive statistics of datasets used in our experiments. Dataset # of train # of dev # of test Vocab size SICK 4,500 500 4,927 2,172 SNLI 550,152 10,000 10,000 18,461 SST 98,794 872 1,821 8,201 IMDB 441,617 223,235 223,236 29,209 |
| Dataset Splits | Yes | Table 1: Descriptive statistics of datasets used in our experiments. Dataset # of train # of dev # of test Vocab size SICK 4,500 500 4,927 2,172 SNLI 550,152 10,000 10,000 18,461 SST 98,794 872 1,821 8,201 IMDB 441,617 223,235 223,236 29,209 |
| Hardware Specification | No | The paper discusses training time and computational limitations, stating, 'an epoch could take a 5-7 hours' and 'it takes 3-4 days for the model to reach convergence', but it does not specify any particular hardware components such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions tools like 'Stanford parser' and 'evalb toolkit' but does not provide specific version numbers for these or any other software dependencies, aside from general algorithm names like LSTM. |
| Experiment Setup | Yes | For learning, we use stochastic gradient descent with minibatches of size 1 and ℓ2 regularization constant tune on development data from {10 4, 10 5, 10 6, 0}. We set the word embedding size to 100 and initialize them with Glove vectors (Pennington et al., 2014). For each sentence, we create a 100-dimensional sentence representation s R100 with Tree LSTM, project it to a 200-dimensional vector and apply Re LU: q = Re LU(Wps + bp), and compute p(ˆy = c | q; wq) exp(wq,cq + bq). We run each model 3 times (corresponding to 3 different initialization points) and use the development data to pick the best model. |