Can Unconditional Language Models Recover Arbitrary Sentences?

Authors: Nishant Subramani, Samuel Bowman, Kyunghyun Cho

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments reveal that we can achieve full recoverability with a reparametrized sentence space with dimension equal to the dimension of the recurrent hidden state of the model, at least for large enough models
Researcher Affiliation Collaboration Nishant Subramani New York University nishant@nyu.edu Samuel R. Bowman New York University Kyunghyun Cho New York Univeristy Facebook AI Research CIFAR Azrieli Global Scholar
Pseudocode No The paper describes its methods through prose and mathematical equations but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement that the source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets Yes Corpus We use the fifth edition of the English Gigaword (Graff et al., 2003) news corpus. ... To evaluate out-of-domain sentence recoverability, we use a random sample of 50 sentences from the IWSLT16 English to German translation dataset (validation portion) processed in the same way and using the same vocabulary.
Dataset Splits Yes We use a development set with 879k sentences from the articles published in November 2010 and a test set of 878k sentences from the articles published in December 2010.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Titan V GPU used at NYU for this research.
Software Dependencies No The paper mentions software tools like NLTK, Moses tokenizer, Sci Py, and Adam, but does not provide specific version numbers for these components. For example: 'implemented in Sci Py (Jones et al., 2014)' and 'Adam with a learning rate of 10-4 on 100-sentence minibatches (Kingma and Ba, 2014)'.
Experiment Setup Yes We construct a small, medium, and large language model consisting of 256, 512, and 1024 LSTM units respectively in each layer. ... We use dropout ... with a drop rate of 0.1, 0.25, and 0.3 respectively. We use stochastic gradient descent with Adam with a learning rate of 10-4 on 100-sentence minibatches ... We measure perplexity on the development set every 10k minibatches, halve the learning rate whenever it increases, and clip the norm of the gradient to 1 ... We perform beam search with beam width 5.