Semi-supervised Sequence Learning
Authors: Andrew M. Dai, Quoc V. Le
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10. |
| Researcher Affiliation | Industry | Andrew M. Dai Google Inc. adai@google.com; Quoc V. Le Google Inc. qvl@google.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the authors are releasing source code for their methodology or provide a link to such a repository. |
| Open Datasets | Yes | The benchmarks of focus are text understanding tasks, with all datasets being publicly available. The tasks are sentiment analysis (IMDB and Rotten Tomatoes) and text classification (20 Newsgroups and DBpedia). IMDB: http://ai.Stanford.edu/amaas/data/sentiment/index.html. Rotten Tomatoes: http://www.cs.cornell.edu/people/pabo/movie-review-data/. Amazon reviews: http://snap.stanford.edu/data/web-Amazon.html. 20 Newsgroups: http://qwone.com/~jason/20Newsgroups/. DBpedia [20]. CIFAR-10. |
| Dataset Splits | Yes | IMDB: We use 15% of the labeled training documents as a validation set. Rotten Tomatoes: The dataset has 10,662 documents, which are randomly split into 80% for training, 10% for validation and 10% for test. 20 Newsgroups: We use 15% of the training documents as a validation set. We choose the dropout parameters based on a validation set. |
| Hardware Specification | No | The paper mentions 'reduce GPU memory usage' but does not specify any particular GPU models or other detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper mentions software components like 'LSTM implementation is standard' or 'word2vec embeddings' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In our experiments with LSTMs, we follow the basic recipes as described in [7, 32] by clipping the cell outputs and gradients. After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128... We choose the dropout parameters based on a validation set. For example, if we set the size of hidden state to be 512 units and truncate the backprop to be 400. In the above table, we use 1,024 units for memory cells, 512 units for the input embedding layer in the LM-LSTM and SA-LSTM. We also use a hidden layer 30 units with dropout of 50% between the last hidden state and the classifier. |