Can Unconditional Language Models Recover Arbitrary Sentences?
Authors: Nishant Subramani, Samuel Bowman, Kyunghyun Cho
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments reveal that we can achieve full recoverability with a reparametrized sentence space with dimension equal to the dimension of the recurrent hidden state of the model, at least for large enough models |
| Researcher Affiliation | Collaboration | Nishant Subramani New York University nishant@nyu.edu Samuel R. Bowman New York University Kyunghyun Cho New York Univeristy Facebook AI Research CIFAR Azrieli Global Scholar |
| Pseudocode | No | The paper describes its methods through prose and mathematical equations but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement that the source code for the described methodology is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Corpus We use the fifth edition of the English Gigaword (Graff et al., 2003) news corpus. ... To evaluate out-of-domain sentence recoverability, we use a random sample of 50 sentences from the IWSLT16 English to German translation dataset (validation portion) processed in the same way and using the same vocabulary. |
| Dataset Splits | Yes | We use a development set with 879k sentences from the articles published in November 2010 and a test set of 878k sentences from the articles published in December 2010. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan V GPU used at NYU for this research. |
| Software Dependencies | No | The paper mentions software tools like NLTK, Moses tokenizer, Sci Py, and Adam, but does not provide specific version numbers for these components. For example: 'implemented in Sci Py (Jones et al., 2014)' and 'Adam with a learning rate of 10-4 on 100-sentence minibatches (Kingma and Ba, 2014)'. |
| Experiment Setup | Yes | We construct a small, medium, and large language model consisting of 256, 512, and 1024 LSTM units respectively in each layer. ... We use dropout ... with a drop rate of 0.1, 0.25, and 0.3 respectively. We use stochastic gradient descent with Adam with a learning rate of 10-4 on 100-sentence minibatches ... We measure perplexity on the development set every 10k minibatches, halve the learning rate whenever it increases, and clip the norm of the gradient to 1 ... We perform beam search with beam width 5. |